mirror of
https://github.com/stablyai/orca
synced 2026-04-21 14:17:16 +00:00
checkpoint: browser automation CDP bridge with CLI commands
Working: snapshot, click, goto, fill, type, select, scroll, back, reload, screenshot, eval, tab list, tab switch. Includes stale webContentsId fix, CLI dev-mode support (ORCA_USER_DATA_PATH), CDP command timeout, and 0-based tab index fix.
This commit is contained in:
parent
3bbe9ed712
commit
99decd7f28
20 changed files with 2489 additions and 13 deletions
138
skills/orca-browser/SKILL.md
Normal file
138
skills/orca-browser/SKILL.md
Normal file
|
|
@ -0,0 +1,138 @@
|
|||
---
|
||||
name: orca-browser
|
||||
description: >
|
||||
Use the Orca browser commands to automate the built-in browser.
|
||||
Triggers: "click on", "fill the form", "take a screenshot",
|
||||
"navigate to", "interact with the page", "extract text from",
|
||||
"snapshot the page", or any task involving browser automation.
|
||||
allowed-tools: Bash(orca:*)
|
||||
---
|
||||
|
||||
# Orca Browser Automation
|
||||
|
||||
Use these commands when the agent needs to interact with the built-in Orca browser — navigating pages, reading page content, clicking elements, filling forms, or verifying UI state.
|
||||
|
||||
## Core Loop
|
||||
|
||||
The browser automation workflow follows a snapshot-interact-re-snapshot loop:
|
||||
|
||||
1. **Snapshot** the page to see interactive elements and their refs.
|
||||
2. **Interact** using refs (`@e1`, `@e3`, etc.) to click, fill, or select.
|
||||
3. **Re-snapshot** after interactions to see the updated page state.
|
||||
|
||||
```bash
|
||||
orca goto --url https://example.com --json
|
||||
orca snapshot --json
|
||||
# Read the refs from the snapshot output
|
||||
orca click --element @e3 --json
|
||||
orca snapshot --json
|
||||
```
|
||||
|
||||
## Element Refs
|
||||
|
||||
Refs like `@e1`, `@e5` are short identifiers assigned to interactive page elements during a snapshot. They are:
|
||||
|
||||
- **Assigned by snapshot**: Run `orca snapshot` to get current refs.
|
||||
- **Scoped to one tab**: Refs from one tab are not valid in another.
|
||||
- **Invalidated by navigation**: If the page navigates after a snapshot, refs become stale. Re-snapshot to get fresh refs.
|
||||
- **Invalidated by tab switch**: Switching tabs with `orca tab switch` invalidates refs. Re-snapshot after switching.
|
||||
|
||||
If a ref is stale, the command returns `browser_stale_ref` — re-snapshot and retry.
|
||||
|
||||
## Commands
|
||||
|
||||
### Navigation
|
||||
|
||||
```bash
|
||||
orca goto --url <url> [--json] # Navigate to URL, waits for page load
|
||||
orca back [--json] # Go back in browser history
|
||||
orca reload [--json] # Reload the current page
|
||||
```
|
||||
|
||||
### Observation
|
||||
|
||||
```bash
|
||||
orca snapshot [--json] # Accessibility tree snapshot with element refs
|
||||
orca screenshot [--format <png|jpeg>] [--json] # Viewport screenshot (base64)
|
||||
```
|
||||
|
||||
### Interaction
|
||||
|
||||
```bash
|
||||
orca click --element <ref> [--json] # Click an element by ref
|
||||
orca fill --element <ref> --value <text> [--json] # Clear and fill an input
|
||||
orca type --input <text> [--json] # Type at current focus (no element targeting)
|
||||
orca select --element <ref> --value <value> [--json] # Select dropdown option
|
||||
orca scroll --direction <up|down> [--amount <pixels>] [--json] # Scroll viewport
|
||||
```
|
||||
|
||||
### Tab Management
|
||||
|
||||
```bash
|
||||
orca tab list [--json] # List open browser tabs
|
||||
orca tab switch --index <n> [--json] # Switch active tab (invalidates refs)
|
||||
```
|
||||
|
||||
### Page Inspection
|
||||
|
||||
```bash
|
||||
orca eval --expression <js> [--json] # Evaluate JS in page context
|
||||
```
|
||||
|
||||
## `fill` vs `type`
|
||||
|
||||
- **`fill`** targets a specific element by ref, clears its value first, then enters text. Use for form fields.
|
||||
- **`type`** types at whatever currently has focus. Use for search boxes or after clicking into an input.
|
||||
|
||||
## Error Codes and Recovery
|
||||
|
||||
| Error Code | Meaning | Recovery |
|
||||
|-----------|---------|----------|
|
||||
| `browser_no_tab` | No browser tab is open | Open a tab in the Orca UI, or use `orca tab list` to check |
|
||||
| `browser_stale_ref` | Ref is invalid (page changed since snapshot) | Run `orca snapshot` to get fresh refs |
|
||||
| `browser_ref_not_found` | Ref was never assigned (typo or out of range) | Run `orca snapshot` to see available refs |
|
||||
| `browser_tab_not_found` | Tab index does not exist | Run `orca tab list` to see available tabs |
|
||||
| `browser_navigation_failed` | URL could not be loaded | Check URL spelling, network connectivity |
|
||||
| `browser_element_not_interactable` | Element is hidden or disabled | Re-snapshot; the element may have changed state |
|
||||
| `browser_eval_error` | JavaScript threw an exception | Fix the expression and retry |
|
||||
| `browser_cdp_error` | Internal browser control error | DevTools may be open — close them and retry |
|
||||
| `browser_debugger_detached` | Tab was closed | Run `orca tab list` to find remaining tabs |
|
||||
| `browser_timeout` | Operation timed out | Page may be slow to load; retry or check network |
|
||||
|
||||
## Worked Example
|
||||
|
||||
Agent fills a login form and verifies the dashboard loads:
|
||||
|
||||
```bash
|
||||
# Navigate to the login page
|
||||
orca goto --url https://app.example.com/login --json
|
||||
|
||||
# See what's on the page
|
||||
orca snapshot --json
|
||||
# Output includes:
|
||||
# [@e1] text input "Email"
|
||||
# [@e2] text input "Password"
|
||||
# [@e3] button "Sign In"
|
||||
|
||||
# Fill the form
|
||||
orca fill --element @e1 --value "user@example.com" --json
|
||||
orca fill --element @e2 --value "s3cret" --json
|
||||
|
||||
# Submit
|
||||
orca click --element @e3 --json
|
||||
|
||||
# Verify the dashboard loaded
|
||||
orca snapshot --json
|
||||
# Output should show dashboard content, not the login form
|
||||
```
|
||||
|
||||
## Agent Guidance
|
||||
|
||||
- Always use `--json` for machine-driven use.
|
||||
- Always snapshot before interacting with elements.
|
||||
- After navigation (`goto`, `back`, `reload`, clicking a link), re-snapshot to get fresh refs.
|
||||
- After switching tabs, re-snapshot.
|
||||
- If you get `browser_stale_ref`, re-snapshot and retry with the new refs.
|
||||
- Use `orca tab list` before `orca tab switch` to know which tabs exist.
|
||||
- Use `orca eval` as an escape hatch for interactions not covered by other commands.
|
||||
- For full IDE/worktree/terminal commands, see the `orca-cli` skill.
|
||||
|
|
@ -167,6 +167,14 @@ Why: terminal handles are runtime-scoped and may go stale after reloads. If Orca
|
|||
- If the user asks for CLI UX feedback, test the public `orca` command first. Only inspect `src/cli` or use `node out/cli/index.js` if the public command is missing or the task is explicitly about implementation internals.
|
||||
- If a command fails, prefer retrying with the public `orca` command before concluding the CLI is broken, unless the failure already came from `orca` itself.
|
||||
|
||||
## Browser Commands
|
||||
|
||||
`orca` also supports browser automation commands for driving the built-in Orca browser. The core loop is: snapshot the page to get element refs → interact using refs → re-snapshot to see the updated state.
|
||||
|
||||
Key commands: `orca snapshot`, `orca click --element @e3`, `orca fill --element @e5 --value "hello"`, `orca goto --url <url>`, `orca tab list`, `orca tab switch --index <n>`.
|
||||
|
||||
For the full browser command reference, error codes, and worked examples, see the `orca-browser` skill.
|
||||
|
||||
## Important Constraints
|
||||
|
||||
- Orca CLI only talks to a running Orca editor.
|
||||
|
|
|
|||
|
|
@ -35,7 +35,23 @@ vi.mock('./runtime-client', () => {
|
|||
}
|
||||
})
|
||||
|
||||
import { buildCurrentWorktreeSelector, main, normalizeWorktreeSelector } from './index'
|
||||
import {
|
||||
buildCurrentWorktreeSelector,
|
||||
COMMAND_SPECS,
|
||||
main,
|
||||
normalizeWorktreeSelector
|
||||
} from './index'
|
||||
|
||||
describe('COMMAND_SPECS collision check', () => {
|
||||
it('has no duplicate command paths', () => {
|
||||
const seen = new Set<string>()
|
||||
for (const spec of COMMAND_SPECS) {
|
||||
const key = spec.path.join(' ')
|
||||
expect(seen.has(key), `Duplicate COMMAND_SPECS path: "${key}"`).toBe(false)
|
||||
seen.add(key)
|
||||
}
|
||||
})
|
||||
})
|
||||
|
||||
describe('orca cli worktree awareness', () => {
|
||||
beforeEach(() => {
|
||||
|
|
|
|||
259
src/cli/index.ts
259
src/cli/index.ts
|
|
@ -13,7 +13,20 @@ import type {
|
|||
RuntimeTerminalListResult,
|
||||
RuntimeTerminalShow,
|
||||
RuntimeTerminalSend,
|
||||
RuntimeTerminalWait
|
||||
RuntimeTerminalWait,
|
||||
BrowserSnapshotResult,
|
||||
BrowserClickResult,
|
||||
BrowserGotoResult,
|
||||
BrowserFillResult,
|
||||
BrowserTypeResult,
|
||||
BrowserSelectResult,
|
||||
BrowserScrollResult,
|
||||
BrowserBackResult,
|
||||
BrowserReloadResult,
|
||||
BrowserScreenshotResult,
|
||||
BrowserEvalResult,
|
||||
BrowserTabListResult,
|
||||
BrowserTabSwitchResult
|
||||
} from '../shared/runtime-types'
|
||||
import {
|
||||
RuntimeClient,
|
||||
|
|
@ -39,7 +52,7 @@ type CommandSpec = {
|
|||
|
||||
const DEFAULT_TERMINAL_WAIT_RPC_TIMEOUT_MS = 5 * 60 * 1000
|
||||
const GLOBAL_FLAGS = ['help', 'json']
|
||||
const COMMAND_SPECS: CommandSpec[] = [
|
||||
export const COMMAND_SPECS: CommandSpec[] = [
|
||||
{
|
||||
path: ['open'],
|
||||
summary: 'Launch Orca and wait for the runtime to be reachable',
|
||||
|
|
@ -169,6 +182,85 @@ const COMMAND_SPECS: CommandSpec[] = [
|
|||
summary: 'Stop terminals for a worktree',
|
||||
usage: 'orca terminal stop --worktree <selector> [--json]',
|
||||
allowedFlags: [...GLOBAL_FLAGS, 'worktree']
|
||||
},
|
||||
// ── Browser automation ──
|
||||
{
|
||||
path: ['snapshot'],
|
||||
summary: 'Capture an accessibility snapshot of the active browser tab',
|
||||
usage: 'orca snapshot [--json]',
|
||||
allowedFlags: [...GLOBAL_FLAGS]
|
||||
},
|
||||
{
|
||||
path: ['screenshot'],
|
||||
summary: 'Capture a viewport screenshot of the active browser tab',
|
||||
usage: 'orca screenshot [--format <png|jpeg>] [--json]',
|
||||
allowedFlags: [...GLOBAL_FLAGS, 'format']
|
||||
},
|
||||
{
|
||||
path: ['click'],
|
||||
summary: 'Click a browser element by ref',
|
||||
usage: 'orca click --element <ref> [--json]',
|
||||
allowedFlags: [...GLOBAL_FLAGS, 'element']
|
||||
},
|
||||
{
|
||||
path: ['fill'],
|
||||
summary: 'Clear and fill a browser input by ref',
|
||||
usage: 'orca fill --element <ref> --value <text> [--json]',
|
||||
allowedFlags: [...GLOBAL_FLAGS, 'element', 'value']
|
||||
},
|
||||
{
|
||||
path: ['type'],
|
||||
summary: 'Type text at the current browser focus',
|
||||
usage: 'orca type --input <text> [--json]',
|
||||
allowedFlags: [...GLOBAL_FLAGS, 'input']
|
||||
},
|
||||
{
|
||||
path: ['select'],
|
||||
summary: 'Select a dropdown option by ref',
|
||||
usage: 'orca select --element <ref> --value <value> [--json]',
|
||||
allowedFlags: [...GLOBAL_FLAGS, 'element', 'value']
|
||||
},
|
||||
{
|
||||
path: ['scroll'],
|
||||
summary: 'Scroll the browser viewport',
|
||||
usage: 'orca scroll --direction <up|down> [--amount <pixels>] [--json]',
|
||||
allowedFlags: [...GLOBAL_FLAGS, 'direction', 'amount']
|
||||
},
|
||||
{
|
||||
path: ['goto'],
|
||||
summary: 'Navigate the active browser tab to a URL',
|
||||
usage: 'orca goto --url <url> [--json]',
|
||||
allowedFlags: [...GLOBAL_FLAGS, 'url']
|
||||
},
|
||||
{
|
||||
path: ['back'],
|
||||
summary: 'Navigate back in browser history',
|
||||
usage: 'orca back [--json]',
|
||||
allowedFlags: [...GLOBAL_FLAGS]
|
||||
},
|
||||
{
|
||||
path: ['reload'],
|
||||
summary: 'Reload the active browser tab',
|
||||
usage: 'orca reload [--json]',
|
||||
allowedFlags: [...GLOBAL_FLAGS]
|
||||
},
|
||||
{
|
||||
path: ['eval'],
|
||||
summary: 'Evaluate JavaScript in the browser page context',
|
||||
usage: 'orca eval --expression <js> [--json]',
|
||||
allowedFlags: [...GLOBAL_FLAGS, 'expression']
|
||||
},
|
||||
{
|
||||
path: ['tab', 'list'],
|
||||
summary: 'List open browser tabs',
|
||||
usage: 'orca tab list [--json]',
|
||||
allowedFlags: [...GLOBAL_FLAGS]
|
||||
},
|
||||
{
|
||||
path: ['tab', 'switch'],
|
||||
summary: 'Switch the active browser tab',
|
||||
usage: 'orca tab switch --index <n> [--json]',
|
||||
allowedFlags: [...GLOBAL_FLAGS, 'index']
|
||||
}
|
||||
]
|
||||
|
||||
|
|
@ -362,6 +454,96 @@ export async function main(argv = process.argv.slice(2), cwd = process.cwd()): P
|
|||
return printResult(result, json, (value) => `removed: ${value.removed}`)
|
||||
}
|
||||
|
||||
// ── Browser automation dispatch ──
|
||||
|
||||
if (matches(commandPath, ['snapshot'])) {
|
||||
const result = await client.call<BrowserSnapshotResult>('browser.snapshot')
|
||||
return printResult(result, json, formatSnapshot)
|
||||
}
|
||||
|
||||
if (matches(commandPath, ['screenshot'])) {
|
||||
const format = getOptionalStringFlag(parsed.flags, 'format')
|
||||
const result = await client.call<BrowserScreenshotResult>('browser.screenshot', {
|
||||
format: format === 'jpeg' ? 'jpeg' : undefined
|
||||
})
|
||||
return printResult(result, json, formatScreenshot)
|
||||
}
|
||||
|
||||
if (matches(commandPath, ['click'])) {
|
||||
const element = getRequiredStringFlag(parsed.flags, 'element')
|
||||
const result = await client.call<BrowserClickResult>('browser.click', { element })
|
||||
return printResult(result, json, (v) => `Clicked ${v.clicked}`)
|
||||
}
|
||||
|
||||
if (matches(commandPath, ['fill'])) {
|
||||
const element = getRequiredStringFlag(parsed.flags, 'element')
|
||||
const value = getRequiredStringFlag(parsed.flags, 'value')
|
||||
const result = await client.call<BrowserFillResult>('browser.fill', { element, value })
|
||||
return printResult(result, json, (v) => `Filled ${v.filled}`)
|
||||
}
|
||||
|
||||
if (matches(commandPath, ['type'])) {
|
||||
const input = getRequiredStringFlag(parsed.flags, 'input')
|
||||
const result = await client.call<BrowserTypeResult>('browser.type', { input })
|
||||
return printResult(result, json, () => 'Typed input')
|
||||
}
|
||||
|
||||
if (matches(commandPath, ['select'])) {
|
||||
const element = getRequiredStringFlag(parsed.flags, 'element')
|
||||
const value = getRequiredStringFlag(parsed.flags, 'value')
|
||||
const result = await client.call<BrowserSelectResult>('browser.select', { element, value })
|
||||
return printResult(result, json, (v) => `Selected ${v.selected}`)
|
||||
}
|
||||
|
||||
if (matches(commandPath, ['scroll'])) {
|
||||
const direction = getRequiredStringFlag(parsed.flags, 'direction')
|
||||
if (direction !== 'up' && direction !== 'down') {
|
||||
throw new RuntimeClientError('invalid_argument', '--direction must be "up" or "down"')
|
||||
}
|
||||
const amount = getOptionalPositiveIntegerFlag(parsed.flags, 'amount')
|
||||
const result = await client.call<BrowserScrollResult>('browser.scroll', {
|
||||
direction,
|
||||
amount
|
||||
})
|
||||
return printResult(result, json, (v) => `Scrolled ${v.scrolled}`)
|
||||
}
|
||||
|
||||
if (matches(commandPath, ['goto'])) {
|
||||
const url = getRequiredStringFlag(parsed.flags, 'url')
|
||||
const result = await client.call<BrowserGotoResult>('browser.goto', { url })
|
||||
return printResult(result, json, (v) => `Navigated to ${v.url} — ${v.title}`)
|
||||
}
|
||||
|
||||
if (matches(commandPath, ['back'])) {
|
||||
const result = await client.call<BrowserBackResult>('browser.back')
|
||||
return printResult(result, json, (v) => `Back to ${v.url} — ${v.title}`)
|
||||
}
|
||||
|
||||
if (matches(commandPath, ['reload'])) {
|
||||
const result = await client.call<BrowserReloadResult>('browser.reload')
|
||||
return printResult(result, json, (v) => `Reloaded ${v.url} — ${v.title}`)
|
||||
}
|
||||
|
||||
if (matches(commandPath, ['eval'])) {
|
||||
const expression = getRequiredStringFlag(parsed.flags, 'expression')
|
||||
const result = await client.call<BrowserEvalResult>('browser.eval', { expression })
|
||||
return printResult(result, json, (v) => v.value)
|
||||
}
|
||||
|
||||
if (matches(commandPath, ['tab', 'list'])) {
|
||||
const result = await client.call<BrowserTabListResult>('browser.tabList')
|
||||
return printResult(result, json, formatTabList)
|
||||
}
|
||||
|
||||
if (matches(commandPath, ['tab', 'switch'])) {
|
||||
const index = getOptionalNonNegativeIntegerFlag(parsed.flags, 'index')
|
||||
if (index === undefined) {
|
||||
throw new RuntimeClientError('invalid_argument', 'Missing required --index')
|
||||
}
|
||||
const result = await client.call<BrowserTabSwitchResult>('browser.tabSwitch', { index })
|
||||
return printResult(result, json, (v) => `Switched to tab ${v.switched}`)
|
||||
}
|
||||
|
||||
throw new RuntimeClientError('invalid_argument', `Unknown command: ${commandPath.join(' ')}`)
|
||||
} catch (error) {
|
||||
if (json) {
|
||||
|
|
@ -446,7 +628,9 @@ export function findCommandSpec(commandPath: string[]): CommandSpec | undefined
|
|||
}
|
||||
|
||||
function isCommandGroup(commandPath: string[]): boolean {
|
||||
return commandPath.length === 1 && ['repo', 'worktree', 'terminal'].includes(commandPath[0])
|
||||
return (
|
||||
commandPath.length === 1 && ['repo', 'worktree', 'terminal', 'tab'].includes(commandPath[0])
|
||||
)
|
||||
}
|
||||
|
||||
function getRequiredStringFlag(flags: Map<string, string | boolean>, name: string): string {
|
||||
|
|
@ -562,6 +746,20 @@ function getOptionalPositiveIntegerFlag(
|
|||
return value
|
||||
}
|
||||
|
||||
function getOptionalNonNegativeIntegerFlag(
|
||||
flags: Map<string, string | boolean>,
|
||||
name: string
|
||||
): number | undefined {
|
||||
const value = getOptionalNumberFlag(flags, name)
|
||||
if (value === undefined) {
|
||||
return undefined
|
||||
}
|
||||
if (!Number.isInteger(value) || value < 0) {
|
||||
throw new RuntimeClientError('invalid_argument', `Invalid non-negative integer for --${name}`)
|
||||
}
|
||||
return value
|
||||
}
|
||||
|
||||
function getOptionalNullableNumberFlag(
|
||||
flags: Map<string, string | boolean>,
|
||||
name: string
|
||||
|
|
@ -737,6 +935,27 @@ function formatWorktreeShow(result: { worktree: RuntimeWorktreeRecord }): string
|
|||
.join('\n')
|
||||
}
|
||||
|
||||
function formatSnapshot(result: BrowserSnapshotResult): string {
|
||||
const header = `${result.title} — ${result.url}\n`
|
||||
return header + result.snapshot
|
||||
}
|
||||
|
||||
function formatScreenshot(result: BrowserScreenshotResult): string {
|
||||
return `Screenshot captured (${result.format}, ${Math.round(result.data.length * 0.75)} bytes)`
|
||||
}
|
||||
|
||||
function formatTabList(result: BrowserTabListResult): string {
|
||||
if (result.tabs.length === 0) {
|
||||
return 'No browser tabs open.'
|
||||
}
|
||||
return result.tabs
|
||||
.map((t) => {
|
||||
const marker = t.active ? '* ' : ' '
|
||||
return `${marker}[${t.index}] ${t.title} — ${t.url}`
|
||||
})
|
||||
.join('\n')
|
||||
}
|
||||
|
||||
function printHelp(commandPath: string[] = []): void {
|
||||
const exactSpec = findCommandSpec(commandPath)
|
||||
if (exactSpec) {
|
||||
|
|
@ -785,6 +1004,21 @@ Terminals:
|
|||
terminal wait Wait for a terminal condition
|
||||
terminal stop Stop terminals for a worktree
|
||||
|
||||
Browser:
|
||||
snapshot Capture an accessibility snapshot of the active browser tab
|
||||
screenshot Capture a viewport screenshot of the active browser tab
|
||||
click Click a browser element by ref
|
||||
fill Clear and fill a browser input by ref
|
||||
type Type text at the current browser focus
|
||||
select Select a dropdown option by ref
|
||||
scroll Scroll the browser viewport
|
||||
goto Navigate the active browser tab to a URL
|
||||
back Navigate back in browser history
|
||||
reload Reload the active browser tab
|
||||
eval Evaluate JavaScript in the browser page context
|
||||
tab list List open browser tabs
|
||||
tab switch Switch the active browser tab
|
||||
|
||||
Common Commands:
|
||||
orca open [--json]
|
||||
orca status [--json]
|
||||
|
|
@ -840,7 +1074,12 @@ Examples:
|
|||
$ orca worktree ps --limit 10
|
||||
$ orca terminal list --worktree path:/Users/me/orca/workspaces/orca/cli-test-1 --json
|
||||
$ orca terminal send --terminal term_123 --text "hi" --enter
|
||||
$ orca terminal wait --terminal term_123 --for exit --timeout-ms 60000 --json`)
|
||||
$ orca terminal wait --terminal term_123 --for exit --timeout-ms 60000 --json
|
||||
$ orca goto --url https://example.com
|
||||
$ orca snapshot
|
||||
$ orca click --element @e3
|
||||
$ orca fill --element @e5 --value "hello"
|
||||
$ orca tab list --json`)
|
||||
}
|
||||
|
||||
function formatCommandHelp(spec: CommandSpec): string {
|
||||
|
|
@ -902,7 +1141,17 @@ function formatFlagHelp(flag: string): string {
|
|||
text: '--text <text> Text to send to the terminal',
|
||||
'timeout-ms': '--timeout-ms <ms> Maximum wait time before timing out',
|
||||
worktree:
|
||||
'--worktree <selector> Worktree selector such as id:<id>, branch:<branch>, issue:<number>, path:<path>, or active/current'
|
||||
'--worktree <selector> Worktree selector such as id:<id>, branch:<branch>, issue:<number>, path:<path>, or active/current',
|
||||
// Browser automation flags
|
||||
element: '--element <ref> Element ref from snapshot (e.g. @e3)',
|
||||
url: '--url <url> URL to navigate to',
|
||||
value: '--value <text> Value to fill or select',
|
||||
input: '--input <text> Text to type at current focus',
|
||||
expression: '--expression <js> JavaScript expression to evaluate',
|
||||
direction: '--direction <up|down> Scroll direction',
|
||||
amount: '--amount <pixels> Scroll distance in pixels',
|
||||
index: '--index <n> Tab index to switch to',
|
||||
format: '--format <png|jpeg> Screenshot image format'
|
||||
}
|
||||
|
||||
return helpByFlag[flag] ?? `--${flag}`
|
||||
|
|
|
|||
|
|
@ -383,6 +383,12 @@ export function getDefaultUserDataPath(
|
|||
platform: NodeJS.Platform = process.platform,
|
||||
homeDir = homedir()
|
||||
): string {
|
||||
// Why: in dev mode, the Electron app writes runtime metadata to `orca-dev`
|
||||
// instead of `orca` to avoid clobbering the production app's metadata. The
|
||||
// CLI needs to find the same metadata file, so respect this env var override.
|
||||
if (process.env.ORCA_USER_DATA_PATH) {
|
||||
return process.env.ORCA_USER_DATA_PATH
|
||||
}
|
||||
if (platform === 'darwin') {
|
||||
return join(homeDir, 'Library', 'Application Support', 'orca')
|
||||
}
|
||||
|
|
|
|||
|
|
@ -71,7 +71,7 @@ function safeOrigin(rawUrl: string): string {
|
|||
}
|
||||
}
|
||||
|
||||
class BrowserManager {
|
||||
export class BrowserManager {
|
||||
private readonly webContentsIdByTabId = new Map<string, number>()
|
||||
// Why: reverse map enables O(1) guest→tab lookups instead of O(N) linear
|
||||
// scans on every mouse event, load failure, permission, and popup event.
|
||||
|
|
|
|||
504
src/main/browser/cdp-bridge-integration.test.ts
Normal file
504
src/main/browser/cdp-bridge-integration.test.ts
Normal file
|
|
@ -0,0 +1,504 @@
|
|||
/* eslint-disable max-lines -- Why: integration test covering the full browser automation pipeline end-to-end. */
|
||||
import { mkdtempSync } from 'fs'
|
||||
import { tmpdir } from 'os'
|
||||
import { join } from 'path'
|
||||
import { createConnection } from 'net'
|
||||
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest'
|
||||
|
||||
// ── Electron mocks ──
|
||||
|
||||
const { webContentsFromIdMock } = vi.hoisted(() => ({
|
||||
webContentsFromIdMock: vi.fn()
|
||||
}))
|
||||
|
||||
vi.mock('electron', () => ({
|
||||
webContents: { fromId: webContentsFromIdMock },
|
||||
shell: { openExternal: vi.fn() },
|
||||
ipcMain: { handle: vi.fn(), removeHandler: vi.fn(), on: vi.fn() },
|
||||
app: { getPath: vi.fn(() => '/tmp'), isPackaged: false }
|
||||
}))
|
||||
|
||||
vi.mock('../git/worktree', () => ({
|
||||
listWorktrees: vi.fn().mockResolvedValue([])
|
||||
}))
|
||||
|
||||
import { BrowserManager } from './browser-manager'
|
||||
import { CdpBridge } from './cdp-bridge'
|
||||
import { OrcaRuntimeService } from '../runtime/orca-runtime'
|
||||
import { OrcaRuntimeRpcServer } from '../runtime/runtime-rpc'
|
||||
import { readRuntimeMetadata } from '../runtime/runtime-metadata'
|
||||
|
||||
// ── CDP response builders ──
|
||||
|
||||
type AXNode = {
|
||||
nodeId: string
|
||||
backendDOMNodeId?: number
|
||||
role?: { type: string; value: string }
|
||||
name?: { type: string; value: string }
|
||||
properties?: { name: string; value: { type: string; value: unknown } }[]
|
||||
childIds?: string[]
|
||||
ignored?: boolean
|
||||
}
|
||||
|
||||
function axNode(
|
||||
id: string,
|
||||
role: string,
|
||||
name: string,
|
||||
opts?: { childIds?: string[]; backendDOMNodeId?: number }
|
||||
): AXNode {
|
||||
return {
|
||||
nodeId: id,
|
||||
backendDOMNodeId: opts?.backendDOMNodeId ?? parseInt(id, 10) * 100,
|
||||
role: { type: 'role', value: role },
|
||||
name: { type: 'computedString', value: name },
|
||||
childIds: opts?.childIds
|
||||
}
|
||||
}
|
||||
|
||||
const EXAMPLE_COM_TREE: AXNode[] = [
|
||||
axNode('1', 'WebArea', 'Example Domain', { childIds: ['2', '3', '4'] }),
|
||||
axNode('2', 'heading', 'Example Domain'),
|
||||
axNode('3', 'staticText', 'This domain is for use in illustrative examples.'),
|
||||
axNode('4', 'link', 'More information...', { backendDOMNodeId: 400 })
|
||||
]
|
||||
|
||||
const SEARCH_PAGE_TREE: AXNode[] = [
|
||||
axNode('1', 'WebArea', 'Search', { childIds: ['2', '3', '4', '5'] }),
|
||||
axNode('2', 'navigation', 'Main Nav', { childIds: ['3'] }),
|
||||
axNode('3', 'link', 'Home', { backendDOMNodeId: 300 }),
|
||||
axNode('4', 'textbox', 'Search query', { backendDOMNodeId: 400 }),
|
||||
axNode('5', 'button', 'Search', { backendDOMNodeId: 500 })
|
||||
]
|
||||
|
||||
// ── Mock WebContents factory ──
|
||||
|
||||
function createMockGuest(id: number, url: string, title: string) {
|
||||
let currentUrl = url
|
||||
let currentTitle = title
|
||||
let currentTree = EXAMPLE_COM_TREE
|
||||
let navHistoryId = 1
|
||||
|
||||
const sendCommandMock = vi.fn(async (method: string, params?: Record<string, unknown>) => {
|
||||
switch (method) {
|
||||
case 'Page.enable':
|
||||
case 'DOM.enable':
|
||||
case 'Accessibility.enable':
|
||||
return {}
|
||||
case 'Accessibility.getFullAXTree':
|
||||
return { nodes: currentTree }
|
||||
case 'Page.getNavigationHistory':
|
||||
return {
|
||||
entries: [{ id: navHistoryId, url: currentUrl }],
|
||||
currentIndex: 0
|
||||
}
|
||||
case 'Page.navigate': {
|
||||
const targetUrl = (params as { url: string }).url
|
||||
if (targetUrl.includes('nonexistent.invalid')) {
|
||||
return { errorText: 'net::ERR_NAME_NOT_RESOLVED' }
|
||||
}
|
||||
navHistoryId++
|
||||
currentUrl = targetUrl
|
||||
if (targetUrl.includes('search.example.com')) {
|
||||
currentTitle = 'Search'
|
||||
currentTree = SEARCH_PAGE_TREE
|
||||
} else {
|
||||
currentTitle = 'Example Domain'
|
||||
currentTree = EXAMPLE_COM_TREE
|
||||
}
|
||||
return {}
|
||||
}
|
||||
case 'Runtime.evaluate': {
|
||||
const expr = (params as { expression: string }).expression
|
||||
if (expr === 'document.readyState') {
|
||||
return { result: { value: 'complete' } }
|
||||
}
|
||||
if (expr.includes('innerWidth')) {
|
||||
return { result: { value: JSON.stringify({ w: 1280, h: 720 }) } }
|
||||
}
|
||||
// eslint-disable-next-line no-eval
|
||||
return { result: { value: String(eval(expr)), type: 'string' } }
|
||||
}
|
||||
case 'DOM.scrollIntoViewIfNeeded':
|
||||
return {}
|
||||
case 'DOM.getBoxModel':
|
||||
return { model: { content: [100, 200, 300, 200, 300, 250, 100, 250] } }
|
||||
case 'Input.dispatchMouseEvent':
|
||||
return {}
|
||||
case 'Input.insertText':
|
||||
return {}
|
||||
case 'Input.dispatchKeyEvent':
|
||||
return {}
|
||||
case 'DOM.focus':
|
||||
return {}
|
||||
case 'DOM.describeNode':
|
||||
return { node: { nodeId: 1 } }
|
||||
case 'DOM.requestNode':
|
||||
return { nodeId: 1 }
|
||||
case 'DOM.resolveNode':
|
||||
return { object: { objectId: 'obj-1' } }
|
||||
case 'Runtime.callFunctionOn':
|
||||
return { result: { value: undefined } }
|
||||
case 'Page.captureScreenshot':
|
||||
return {
|
||||
data: 'iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg=='
|
||||
}
|
||||
case 'Page.reload':
|
||||
return {}
|
||||
default:
|
||||
throw new Error(`Unexpected CDP method: ${method}`)
|
||||
}
|
||||
})
|
||||
|
||||
const debuggerListeners = new Map<string, ((...args: unknown[]) => void)[]>()
|
||||
|
||||
const guest = {
|
||||
id,
|
||||
isDestroyed: vi.fn(() => false),
|
||||
getType: vi.fn(() => 'webview'),
|
||||
getURL: vi.fn(() => currentUrl),
|
||||
getTitle: vi.fn(() => currentTitle),
|
||||
setBackgroundThrottling: vi.fn(),
|
||||
setWindowOpenHandler: vi.fn(),
|
||||
on: vi.fn(),
|
||||
off: vi.fn(),
|
||||
debugger: {
|
||||
attach: vi.fn(),
|
||||
detach: vi.fn(),
|
||||
sendCommand: sendCommandMock,
|
||||
on: vi.fn((event: string, handler: (...args: unknown[]) => void) => {
|
||||
const handlers = debuggerListeners.get(event) ?? []
|
||||
handlers.push(handler)
|
||||
debuggerListeners.set(event, handlers)
|
||||
}),
|
||||
off: vi.fn()
|
||||
}
|
||||
}
|
||||
|
||||
return { guest, sendCommandMock }
|
||||
}
|
||||
|
||||
// ── RPC helper ──
|
||||
|
||||
async function sendRequest(
|
||||
endpoint: string,
|
||||
request: Record<string, unknown>
|
||||
): Promise<Record<string, unknown>> {
|
||||
return await new Promise((resolve, reject) => {
|
||||
const socket = createConnection(endpoint)
|
||||
let buffer = ''
|
||||
socket.setEncoding('utf8')
|
||||
socket.once('error', reject)
|
||||
socket.on('data', (chunk) => {
|
||||
buffer += chunk
|
||||
const newlineIndex = buffer.indexOf('\n')
|
||||
if (newlineIndex === -1) {
|
||||
return
|
||||
}
|
||||
const message = buffer.slice(0, newlineIndex)
|
||||
socket.end()
|
||||
resolve(JSON.parse(message) as Record<string, unknown>)
|
||||
})
|
||||
socket.on('connect', () => {
|
||||
socket.write(`${JSON.stringify(request)}\n`)
|
||||
})
|
||||
})
|
||||
}
|
||||
|
||||
// ── Tests ──
|
||||
|
||||
describe('Browser automation pipeline (integration)', () => {
|
||||
let server: OrcaRuntimeRpcServer
|
||||
let endpoint: string
|
||||
let authToken: string
|
||||
|
||||
const GUEST_WC_ID = 5001
|
||||
const RENDERER_WC_ID = 1
|
||||
|
||||
beforeEach(async () => {
|
||||
const { guest } = createMockGuest(GUEST_WC_ID, 'https://example.com', 'Example Domain')
|
||||
webContentsFromIdMock.mockImplementation((id: number) => {
|
||||
if (id === GUEST_WC_ID) {
|
||||
return guest
|
||||
}
|
||||
return null
|
||||
})
|
||||
|
||||
const browserManager = new BrowserManager()
|
||||
// Simulate the attach-time policy (normally done in will-attach-webview)
|
||||
browserManager.attachGuestPolicies(guest as never)
|
||||
browserManager.registerGuest({
|
||||
browserPageId: 'page-1',
|
||||
webContentsId: GUEST_WC_ID,
|
||||
rendererWebContentsId: RENDERER_WC_ID
|
||||
})
|
||||
|
||||
const cdpBridge = new CdpBridge(browserManager)
|
||||
cdpBridge.setActiveTab(GUEST_WC_ID)
|
||||
|
||||
const userDataPath = mkdtempSync(join(tmpdir(), 'browser-e2e-'))
|
||||
const runtime = new OrcaRuntimeService()
|
||||
runtime.setCdpBridge(cdpBridge)
|
||||
|
||||
server = new OrcaRuntimeRpcServer({ runtime, userDataPath })
|
||||
await server.start()
|
||||
|
||||
const metadata = readRuntimeMetadata(userDataPath)!
|
||||
endpoint = metadata.transport!.endpoint
|
||||
authToken = metadata.authToken!
|
||||
})
|
||||
|
||||
afterEach(async () => {
|
||||
await server.stop()
|
||||
})
|
||||
|
||||
async function rpc(method: string, params?: Record<string, unknown>) {
|
||||
const response = await sendRequest(endpoint, {
|
||||
id: `req_${method}`,
|
||||
authToken,
|
||||
method,
|
||||
...(params ? { params } : {})
|
||||
})
|
||||
return response
|
||||
}
|
||||
|
||||
// ── Snapshot ──
|
||||
|
||||
it('takes a snapshot and returns refs for interactive elements', async () => {
|
||||
const res = await rpc('browser.snapshot')
|
||||
expect(res.ok).toBe(true)
|
||||
|
||||
const result = res.result as {
|
||||
snapshot: string
|
||||
refs: { ref: string; role: string; name: string }[]
|
||||
url: string
|
||||
title: string
|
||||
}
|
||||
expect(result.url).toBe('https://example.com')
|
||||
expect(result.title).toBe('Example Domain')
|
||||
expect(result.snapshot).toContain('heading "Example Domain"')
|
||||
expect(result.snapshot).toContain('link "More information..."')
|
||||
expect(result.refs).toHaveLength(1)
|
||||
expect(result.refs[0]).toMatchObject({
|
||||
ref: '@e1',
|
||||
role: 'link',
|
||||
name: 'More information...'
|
||||
})
|
||||
})
|
||||
|
||||
// ── Click ──
|
||||
|
||||
it('clicks an element by ref after snapshot', async () => {
|
||||
await rpc('browser.snapshot')
|
||||
|
||||
const res = await rpc('browser.click', { element: '@e1' })
|
||||
expect(res.ok).toBe(true)
|
||||
expect((res.result as { clicked: string }).clicked).toBe('@e1')
|
||||
})
|
||||
|
||||
it('returns error when clicking without a prior snapshot', async () => {
|
||||
const res = await rpc('browser.click', { element: '@e1' })
|
||||
expect(res.ok).toBe(false)
|
||||
expect((res.error as { code: string }).code).toBe('browser_stale_ref')
|
||||
})
|
||||
|
||||
it('returns error for non-existent ref', async () => {
|
||||
await rpc('browser.snapshot')
|
||||
|
||||
const res = await rpc('browser.click', { element: '@e999' })
|
||||
expect(res.ok).toBe(false)
|
||||
expect((res.error as { code: string }).code).toBe('browser_ref_not_found')
|
||||
})
|
||||
|
||||
// ── Navigation ──
|
||||
|
||||
it('navigates to a URL and invalidates refs', async () => {
|
||||
await rpc('browser.snapshot')
|
||||
|
||||
const gotoRes = await rpc('browser.goto', { url: 'https://search.example.com' })
|
||||
expect(gotoRes.ok).toBe(true)
|
||||
const gotoResult = gotoRes.result as { url: string; title: string }
|
||||
expect(gotoResult.url).toBe('https://search.example.com')
|
||||
expect(gotoResult.title).toBe('Search')
|
||||
|
||||
// Old refs should be stale after navigation
|
||||
const clickRes = await rpc('browser.click', { element: '@e1' })
|
||||
expect(clickRes.ok).toBe(false)
|
||||
expect((clickRes.error as { code: string }).code).toBe('browser_stale_ref')
|
||||
|
||||
// Re-snapshot should work and show new page
|
||||
const snapRes = await rpc('browser.snapshot')
|
||||
expect(snapRes.ok).toBe(true)
|
||||
const snapResult = snapRes.result as { snapshot: string; refs: { name: string }[] }
|
||||
expect(snapResult.snapshot).toContain('Search')
|
||||
expect(snapResult.refs.map((r) => r.name)).toContain('Search')
|
||||
expect(snapResult.refs.map((r) => r.name)).toContain('Home')
|
||||
})
|
||||
|
||||
it('returns error for failed navigation', async () => {
|
||||
const res = await rpc('browser.goto', { url: 'https://nonexistent.invalid' })
|
||||
expect(res.ok).toBe(false)
|
||||
expect((res.error as { code: string }).code).toBe('browser_navigation_failed')
|
||||
})
|
||||
|
||||
// ── Fill ──
|
||||
|
||||
it('fills an input by ref', async () => {
|
||||
await rpc('browser.goto', { url: 'https://search.example.com' })
|
||||
await rpc('browser.snapshot')
|
||||
|
||||
// @e2 should be the textbox "Search query" on the search page
|
||||
const res = await rpc('browser.fill', { element: '@e2', value: 'hello world' })
|
||||
expect(res.ok).toBe(true)
|
||||
expect((res.result as { filled: string }).filled).toBe('@e2')
|
||||
})
|
||||
|
||||
// ── Type ──
|
||||
|
||||
it('types text at current focus', async () => {
|
||||
const res = await rpc('browser.type', { input: 'some text' })
|
||||
expect(res.ok).toBe(true)
|
||||
expect((res.result as { typed: boolean }).typed).toBe(true)
|
||||
})
|
||||
|
||||
// ── Select ──
|
||||
|
||||
it('selects a dropdown option by ref', async () => {
|
||||
await rpc('browser.goto', { url: 'https://search.example.com' })
|
||||
await rpc('browser.snapshot')
|
||||
|
||||
const res = await rpc('browser.select', { element: '@e2', value: 'option-1' })
|
||||
expect(res.ok).toBe(true)
|
||||
expect((res.result as { selected: string }).selected).toBe('@e2')
|
||||
})
|
||||
|
||||
// ── Scroll ──
|
||||
|
||||
it('scrolls the viewport', async () => {
|
||||
const res = await rpc('browser.scroll', { direction: 'down' })
|
||||
expect(res.ok).toBe(true)
|
||||
expect((res.result as { scrolled: string }).scrolled).toBe('down')
|
||||
|
||||
const res2 = await rpc('browser.scroll', { direction: 'up', amount: 200 })
|
||||
expect(res2.ok).toBe(true)
|
||||
expect((res2.result as { scrolled: string }).scrolled).toBe('up')
|
||||
})
|
||||
|
||||
// ── Reload ──
|
||||
|
||||
it('reloads the page', async () => {
|
||||
const res = await rpc('browser.reload')
|
||||
expect(res.ok).toBe(true)
|
||||
expect((res.result as { url: string }).url).toBe('https://example.com')
|
||||
})
|
||||
|
||||
// ── Screenshot ──
|
||||
|
||||
it('captures a screenshot', async () => {
|
||||
const res = await rpc('browser.screenshot', { format: 'png' })
|
||||
expect(res.ok).toBe(true)
|
||||
const result = res.result as { data: string; format: string }
|
||||
expect(result.format).toBe('png')
|
||||
expect(result.data.length).toBeGreaterThan(0)
|
||||
})
|
||||
|
||||
// ── Eval ──
|
||||
|
||||
it('evaluates JavaScript in the page context', async () => {
|
||||
const res = await rpc('browser.eval', { expression: '2 + 2' })
|
||||
expect(res.ok).toBe(true)
|
||||
expect((res.result as { value: string }).value).toBe('4')
|
||||
})
|
||||
|
||||
// ── Tab management ──
|
||||
|
||||
it('lists open tabs', async () => {
|
||||
const res = await rpc('browser.tabList')
|
||||
expect(res.ok).toBe(true)
|
||||
const result = res.result as { tabs: { index: number; url: string; active: boolean }[] }
|
||||
expect(result.tabs).toHaveLength(1)
|
||||
expect(result.tabs[0]).toMatchObject({
|
||||
index: 0,
|
||||
url: 'https://example.com',
|
||||
active: true
|
||||
})
|
||||
})
|
||||
|
||||
it('returns error for out-of-range tab switch', async () => {
|
||||
const res = await rpc('browser.tabSwitch', { index: 5 })
|
||||
expect(res.ok).toBe(false)
|
||||
expect((res.error as { code: string }).code).toBe('browser_tab_not_found')
|
||||
})
|
||||
|
||||
// ── Full agent workflow simulation ──
|
||||
|
||||
it('simulates a complete agent workflow: navigate → snapshot → interact → re-snapshot', async () => {
|
||||
// 1. Navigate to search page
|
||||
const gotoRes = await rpc('browser.goto', { url: 'https://search.example.com' })
|
||||
expect(gotoRes.ok).toBe(true)
|
||||
|
||||
// 2. Snapshot the page
|
||||
const snap1 = await rpc('browser.snapshot')
|
||||
expect(snap1.ok).toBe(true)
|
||||
const snap1Result = snap1.result as {
|
||||
snapshot: string
|
||||
refs: { ref: string; role: string; name: string }[]
|
||||
}
|
||||
|
||||
// Verify we see the search page structure
|
||||
expect(snap1Result.snapshot).toContain('[Main Nav]')
|
||||
expect(snap1Result.snapshot).toContain('text input "Search query"')
|
||||
expect(snap1Result.snapshot).toContain('button "Search"')
|
||||
|
||||
// 3. Fill the search input
|
||||
const searchInput = snap1Result.refs.find((r) => r.name === 'Search query')
|
||||
expect(searchInput).toBeDefined()
|
||||
const fillRes = await rpc('browser.fill', {
|
||||
element: searchInput!.ref,
|
||||
value: 'integration testing'
|
||||
})
|
||||
expect(fillRes.ok).toBe(true)
|
||||
|
||||
// 4. Click the search button
|
||||
const searchBtn = snap1Result.refs.find((r) => r.name === 'Search')
|
||||
expect(searchBtn).toBeDefined()
|
||||
const clickRes = await rpc('browser.click', { element: searchBtn!.ref })
|
||||
expect(clickRes.ok).toBe(true)
|
||||
|
||||
// 5. Take a screenshot
|
||||
const ssRes = await rpc('browser.screenshot')
|
||||
expect(ssRes.ok).toBe(true)
|
||||
|
||||
// 6. Check tab list
|
||||
const tabRes = await rpc('browser.tabList')
|
||||
expect(tabRes.ok).toBe(true)
|
||||
const tabs = (tabRes.result as { tabs: { url: string }[] }).tabs
|
||||
expect(tabs[0].url).toBe('https://search.example.com')
|
||||
})
|
||||
|
||||
// ── No tab errors ──
|
||||
|
||||
it('returns browser_no_tab when no tabs are registered', async () => {
|
||||
// Create a fresh setup with no registered tabs
|
||||
const emptyManager = new BrowserManager()
|
||||
const emptyBridge = new CdpBridge(emptyManager)
|
||||
|
||||
const userDataPath2 = mkdtempSync(join(tmpdir(), 'browser-e2e-empty-'))
|
||||
const runtime2 = new OrcaRuntimeService()
|
||||
runtime2.setCdpBridge(emptyBridge)
|
||||
|
||||
const server2 = new OrcaRuntimeRpcServer({ runtime: runtime2, userDataPath: userDataPath2 })
|
||||
await server2.start()
|
||||
|
||||
const metadata2 = readRuntimeMetadata(userDataPath2)!
|
||||
const res = await sendRequest(metadata2.transport!.endpoint, {
|
||||
id: 'req_no_tab',
|
||||
authToken: metadata2.authToken,
|
||||
method: 'browser.snapshot'
|
||||
})
|
||||
|
||||
expect(res.ok).toBe(false)
|
||||
expect((res.error as { code: string }).code).toBe('browser_no_tab')
|
||||
|
||||
await server2.stop()
|
||||
})
|
||||
})
|
||||
638
src/main/browser/cdp-bridge.ts
Normal file
638
src/main/browser/cdp-bridge.ts
Normal file
|
|
@ -0,0 +1,638 @@
|
|||
/* eslint-disable max-lines -- Why: the CDP bridge owns debugger lifecycle, ref map management, command serialization, and all browser interaction logic in one module so the browser automation boundary stays coherent. */
|
||||
import { webContents } from 'electron'
|
||||
import type {
|
||||
BrowserClickResult,
|
||||
BrowserEvalResult,
|
||||
BrowserFillResult,
|
||||
BrowserGotoResult,
|
||||
BrowserScreenshotResult,
|
||||
BrowserScrollResult,
|
||||
BrowserSelectResult,
|
||||
BrowserSnapshotResult,
|
||||
BrowserTabInfo,
|
||||
BrowserTabListResult,
|
||||
BrowserTabSwitchResult,
|
||||
BrowserTypeResult
|
||||
} from '../../shared/runtime-types'
|
||||
import { buildSnapshot, type CdpCommandSender, type SnapshotResult } from './snapshot-engine'
|
||||
import type { BrowserManager } from './browser-manager'
|
||||
|
||||
export class BrowserError extends Error {
|
||||
constructor(
|
||||
readonly code: string,
|
||||
message: string
|
||||
) {
|
||||
super(message)
|
||||
}
|
||||
}
|
||||
|
||||
type TabState = {
|
||||
navigationId: string | null
|
||||
snapshotResult: SnapshotResult | null
|
||||
debuggerAttached: boolean
|
||||
}
|
||||
|
||||
type QueuedCommand = {
|
||||
execute: () => Promise<unknown>
|
||||
resolve: (value: unknown) => void
|
||||
reject: (reason: unknown) => void
|
||||
}
|
||||
|
||||
export class CdpBridge {
|
||||
private activeWebContentsId: number | null = null
|
||||
private readonly tabState = new Map<string, TabState>()
|
||||
private readonly commandQueues = new Map<string, QueuedCommand[]>()
|
||||
private readonly processingQueues = new Set<string>()
|
||||
private readonly browserManager: BrowserManager
|
||||
|
||||
constructor(browserManager: BrowserManager) {
|
||||
this.browserManager = browserManager
|
||||
}
|
||||
|
||||
setActiveTab(webContentsId: number): void {
|
||||
this.activeWebContentsId = webContentsId
|
||||
}
|
||||
|
||||
getActiveWebContentsId(): number | null {
|
||||
return this.activeWebContentsId
|
||||
}
|
||||
|
||||
async snapshot(): Promise<BrowserSnapshotResult> {
|
||||
return this.enqueueCommand(async () => {
|
||||
const guest = this.getActiveGuest()
|
||||
const sender = this.makeCdpSender(guest)
|
||||
await this.ensureDebuggerAttached(guest)
|
||||
|
||||
const result = await buildSnapshot(sender)
|
||||
const tabId = this.resolveTabId(guest.id)
|
||||
|
||||
const state = this.getOrCreateTabState(tabId)
|
||||
state.snapshotResult = result
|
||||
|
||||
const navId = await this.getNavigationId(sender)
|
||||
state.navigationId = navId
|
||||
|
||||
return {
|
||||
snapshot: result.snapshot,
|
||||
refs: result.refs,
|
||||
url: guest.getURL(),
|
||||
title: guest.getTitle()
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
async click(element: string): Promise<BrowserClickResult> {
|
||||
return this.enqueueCommand(async () => {
|
||||
const guest = this.getActiveGuest()
|
||||
const sender = this.makeCdpSender(guest)
|
||||
await this.ensureDebuggerAttached(guest)
|
||||
|
||||
const node = await this.resolveRef(guest, sender, element)
|
||||
|
||||
await sender('DOM.scrollIntoViewIfNeeded', { backendNodeId: node.backendDOMNodeId })
|
||||
const { model } = (await sender('DOM.getBoxModel', {
|
||||
backendNodeId: node.backendDOMNodeId
|
||||
})) as { model: { content: number[] } }
|
||||
|
||||
const [x1, y1, , , x3, y3] = model.content
|
||||
const cx = (x1 + x3) / 2
|
||||
const cy = (y1 + y3) / 2
|
||||
|
||||
await sender('Input.dispatchMouseEvent', {
|
||||
type: 'mousePressed',
|
||||
x: cx,
|
||||
y: cy,
|
||||
button: 'left',
|
||||
clickCount: 1
|
||||
})
|
||||
await sender('Input.dispatchMouseEvent', {
|
||||
type: 'mouseReleased',
|
||||
x: cx,
|
||||
y: cy,
|
||||
button: 'left',
|
||||
clickCount: 1
|
||||
})
|
||||
|
||||
return { clicked: element }
|
||||
})
|
||||
}
|
||||
|
||||
async goto(url: string): Promise<BrowserGotoResult> {
|
||||
return this.enqueueCommand(async () => {
|
||||
const guest = this.getActiveGuest()
|
||||
const sender = this.makeCdpSender(guest)
|
||||
await this.ensureDebuggerAttached(guest)
|
||||
|
||||
const { errorText } = (await sender('Page.navigate', { url })) as {
|
||||
errorText?: string
|
||||
}
|
||||
|
||||
if (errorText) {
|
||||
throw new BrowserError('browser_navigation_failed', `Navigation failed: ${errorText}`)
|
||||
}
|
||||
|
||||
await this.waitForLoad(sender)
|
||||
this.invalidateRefMap(guest.id)
|
||||
|
||||
return { url: guest.getURL(), title: guest.getTitle() }
|
||||
})
|
||||
}
|
||||
|
||||
async fill(element: string, value: string): Promise<BrowserFillResult> {
|
||||
return this.enqueueCommand(async () => {
|
||||
const guest = this.getActiveGuest()
|
||||
const sender = this.makeCdpSender(guest)
|
||||
await this.ensureDebuggerAttached(guest)
|
||||
|
||||
const node = await this.resolveRef(guest, sender, element)
|
||||
|
||||
await sender('DOM.focus', { backendNodeId: node.backendDOMNodeId })
|
||||
|
||||
// Why: select-all then delete clears any existing value before typing,
|
||||
// matching the behavior of Playwright's fill() and agent-browser's fill.
|
||||
await sender('Input.dispatchKeyEvent', {
|
||||
type: 'keyDown',
|
||||
key: 'a',
|
||||
modifiers: process.platform === 'darwin' ? 4 : 2
|
||||
})
|
||||
await sender('Input.dispatchKeyEvent', {
|
||||
type: 'keyUp',
|
||||
key: 'a',
|
||||
modifiers: process.platform === 'darwin' ? 4 : 2
|
||||
})
|
||||
await sender('Input.dispatchKeyEvent', { type: 'keyDown', key: 'Delete' })
|
||||
await sender('Input.dispatchKeyEvent', { type: 'keyUp', key: 'Delete' })
|
||||
|
||||
await sender('Input.insertText', { text: value })
|
||||
|
||||
return { filled: element }
|
||||
})
|
||||
}
|
||||
|
||||
async type(input: string): Promise<BrowserTypeResult> {
|
||||
return this.enqueueCommand(async () => {
|
||||
const guest = this.getActiveGuest()
|
||||
const sender = this.makeCdpSender(guest)
|
||||
await this.ensureDebuggerAttached(guest)
|
||||
|
||||
await sender('Input.insertText', { text: input })
|
||||
return { typed: true }
|
||||
})
|
||||
}
|
||||
|
||||
async select(element: string, value: string): Promise<BrowserSelectResult> {
|
||||
return this.enqueueCommand(async () => {
|
||||
const guest = this.getActiveGuest()
|
||||
const sender = this.makeCdpSender(guest)
|
||||
await this.ensureDebuggerAttached(guest)
|
||||
|
||||
const node = await this.resolveRef(guest, sender, element)
|
||||
const { nodeId } = (await sender('DOM.requestNode', {
|
||||
backendNodeId: node.backendDOMNodeId
|
||||
})) as { nodeId: number }
|
||||
|
||||
const { object } = (await sender('DOM.resolveNode', { nodeId })) as {
|
||||
object: { objectId: string }
|
||||
}
|
||||
|
||||
await sender('Runtime.callFunctionOn', {
|
||||
objectId: object.objectId,
|
||||
functionDeclaration: `function(val) {
|
||||
this.value = val;
|
||||
this.dispatchEvent(new Event('input', { bubbles: true }));
|
||||
this.dispatchEvent(new Event('change', { bubbles: true }));
|
||||
}`,
|
||||
arguments: [{ value }]
|
||||
})
|
||||
|
||||
return { selected: element }
|
||||
})
|
||||
}
|
||||
|
||||
async scroll(direction: 'up' | 'down', amount?: number): Promise<BrowserScrollResult> {
|
||||
return this.enqueueCommand(async () => {
|
||||
const guest = this.getActiveGuest()
|
||||
const sender = this.makeCdpSender(guest)
|
||||
await this.ensureDebuggerAttached(guest)
|
||||
|
||||
const { result: viewportResult } = (await sender('Runtime.evaluate', {
|
||||
expression: 'JSON.stringify({ w: window.innerWidth, h: window.innerHeight })',
|
||||
returnByValue: true
|
||||
})) as { result: { value: string } }
|
||||
const viewport = JSON.parse(viewportResult.value) as { w: number; h: number }
|
||||
const scrollAmount = amount ?? viewport.h
|
||||
|
||||
const deltaY = direction === 'down' ? scrollAmount : -scrollAmount
|
||||
await sender('Input.dispatchMouseEvent', {
|
||||
type: 'mouseWheel',
|
||||
x: viewport.w / 2,
|
||||
y: viewport.h / 2,
|
||||
deltaX: 0,
|
||||
deltaY
|
||||
})
|
||||
|
||||
return { scrolled: direction }
|
||||
})
|
||||
}
|
||||
|
||||
async back(): Promise<{ url: string; title: string }> {
|
||||
return this.enqueueCommand(async () => {
|
||||
const guest = this.getActiveGuest()
|
||||
const sender = this.makeCdpSender(guest)
|
||||
await this.ensureDebuggerAttached(guest)
|
||||
|
||||
await sender('Page.navigateToHistoryEntry', {
|
||||
entryId: await this.getPreviousHistoryEntryId(sender)
|
||||
})
|
||||
await this.waitForLoad(sender)
|
||||
this.invalidateRefMap(guest.id)
|
||||
|
||||
return { url: guest.getURL(), title: guest.getTitle() }
|
||||
})
|
||||
}
|
||||
|
||||
async reload(): Promise<{ url: string; title: string }> {
|
||||
return this.enqueueCommand(async () => {
|
||||
const guest = this.getActiveGuest()
|
||||
const sender = this.makeCdpSender(guest)
|
||||
await this.ensureDebuggerAttached(guest)
|
||||
|
||||
await sender('Page.reload')
|
||||
await this.waitForLoad(sender)
|
||||
this.invalidateRefMap(guest.id)
|
||||
|
||||
return { url: guest.getURL(), title: guest.getTitle() }
|
||||
})
|
||||
}
|
||||
|
||||
async screenshot(format: 'png' | 'jpeg' = 'png'): Promise<BrowserScreenshotResult> {
|
||||
return this.enqueueCommand(async () => {
|
||||
const guest = this.getActiveGuest()
|
||||
const sender = this.makeCdpSender(guest)
|
||||
await this.ensureDebuggerAttached(guest)
|
||||
|
||||
const { data } = (await sender('Page.captureScreenshot', {
|
||||
format
|
||||
})) as { data: string }
|
||||
|
||||
return { data, format }
|
||||
})
|
||||
}
|
||||
|
||||
async evaluate(expression: string): Promise<BrowserEvalResult> {
|
||||
return this.enqueueCommand(async () => {
|
||||
const guest = this.getActiveGuest()
|
||||
const sender = this.makeCdpSender(guest)
|
||||
await this.ensureDebuggerAttached(guest)
|
||||
|
||||
const { result, exceptionDetails } = (await sender('Runtime.evaluate', {
|
||||
expression,
|
||||
returnByValue: true
|
||||
})) as {
|
||||
result: { value?: unknown; type: string; description?: string }
|
||||
exceptionDetails?: { text: string; exception?: { description?: string } }
|
||||
}
|
||||
|
||||
if (exceptionDetails) {
|
||||
throw new BrowserError(
|
||||
'browser_eval_error',
|
||||
exceptionDetails.exception?.description ?? exceptionDetails.text
|
||||
)
|
||||
}
|
||||
|
||||
return {
|
||||
value: result.value !== undefined ? String(result.value) : (result.description ?? '')
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
tabList(): BrowserTabListResult {
|
||||
const tabs: BrowserTabInfo[] = []
|
||||
let index = 0
|
||||
|
||||
for (const [_tabId, wcId] of this.getRegisteredTabs()) {
|
||||
const guest = webContents.fromId(wcId)
|
||||
if (!guest || guest.isDestroyed()) {
|
||||
continue
|
||||
}
|
||||
tabs.push({
|
||||
index,
|
||||
url: guest.getURL(),
|
||||
title: guest.getTitle(),
|
||||
active: wcId === this.activeWebContentsId
|
||||
})
|
||||
index++
|
||||
}
|
||||
|
||||
return { tabs }
|
||||
}
|
||||
|
||||
async tabSwitch(index: number): Promise<BrowserTabSwitchResult> {
|
||||
const entries = [...this.getRegisteredTabs()]
|
||||
if (index < 0 || index >= entries.length) {
|
||||
throw new BrowserError(
|
||||
'browser_tab_not_found',
|
||||
`Tab index ${index} is out of range. ${entries.length} tab(s) open.`
|
||||
)
|
||||
}
|
||||
|
||||
const [_tabId, wcId] = entries[index]
|
||||
if (this.activeWebContentsId !== null) {
|
||||
this.invalidateRefMap(this.activeWebContentsId)
|
||||
}
|
||||
this.activeWebContentsId = wcId
|
||||
|
||||
return { switched: index }
|
||||
}
|
||||
|
||||
onTabClosed(webContentsId: number): void {
|
||||
if (this.activeWebContentsId === webContentsId) {
|
||||
this.activeWebContentsId = null
|
||||
}
|
||||
const tabId = this.resolveTabIdSafe(webContentsId)
|
||||
if (tabId) {
|
||||
this.tabState.delete(tabId)
|
||||
this.commandQueues.delete(tabId)
|
||||
}
|
||||
}
|
||||
|
||||
onTabChanged(webContentsId: number): void {
|
||||
this.activeWebContentsId = webContentsId
|
||||
}
|
||||
|
||||
// ── Private helpers ──
|
||||
|
||||
private getActiveGuest(): Electron.WebContents {
|
||||
if (this.activeWebContentsId !== null) {
|
||||
const guest = webContents.fromId(this.activeWebContentsId)
|
||||
if (guest && !guest.isDestroyed()) {
|
||||
return guest
|
||||
}
|
||||
// Why: the stored webContentsId may be stale after a Chromium process swap
|
||||
// (navigation to a different-origin page, crash recovery). Fall through to
|
||||
// the auto-select logic rather than immediately failing, since the tab may
|
||||
// still be alive under a new webContentsId.
|
||||
this.activeWebContentsId = null
|
||||
}
|
||||
|
||||
const tabs = [...this.getRegisteredTabs()]
|
||||
if (tabs.length === 0) {
|
||||
throw new BrowserError(
|
||||
'browser_no_tab',
|
||||
'No browser tab is open. Use the Orca UI to open a browser tab first.'
|
||||
)
|
||||
}
|
||||
if (tabs.length === 1) {
|
||||
this.activeWebContentsId = tabs[0][1]
|
||||
} else {
|
||||
throw new BrowserError(
|
||||
'browser_no_tab',
|
||||
"Multiple browser tabs are open. Run 'orca tab list' and 'orca tab switch --index <n>' to select one."
|
||||
)
|
||||
}
|
||||
|
||||
const guest = webContents.fromId(this.activeWebContentsId!)
|
||||
if (!guest || guest.isDestroyed()) {
|
||||
this.activeWebContentsId = null
|
||||
throw new BrowserError(
|
||||
'browser_debugger_detached',
|
||||
"The active browser tab was closed. Run 'orca tab list' to find remaining tabs."
|
||||
)
|
||||
}
|
||||
return guest
|
||||
}
|
||||
|
||||
private getRegisteredTabs(): Map<string, number> {
|
||||
// Why: BrowserManager's tab maps are private. We access the singleton's
|
||||
// state via the public getGuestWebContentsId method by iterating known tabs.
|
||||
// This method provides the tab enumeration the CDP bridge needs without
|
||||
// modifying BrowserManager's encapsulation. In the future a public
|
||||
// listTabs() method on BrowserManager would be cleaner.
|
||||
return (this.browserManager as unknown as { webContentsIdByTabId: Map<string, number> })
|
||||
.webContentsIdByTabId
|
||||
}
|
||||
|
||||
private resolveTabId(webContentsId: number): string {
|
||||
for (const [tabId, wcId] of this.getRegisteredTabs()) {
|
||||
if (wcId === webContentsId) {
|
||||
return tabId
|
||||
}
|
||||
}
|
||||
throw new BrowserError('browser_debugger_detached', 'Tab is no longer registered.')
|
||||
}
|
||||
|
||||
private resolveTabIdSafe(webContentsId: number): string | null {
|
||||
for (const [tabId, wcId] of this.getRegisteredTabs()) {
|
||||
if (wcId === webContentsId) {
|
||||
return tabId
|
||||
}
|
||||
}
|
||||
return null
|
||||
}
|
||||
|
||||
private getOrCreateTabState(tabId: string): TabState {
|
||||
let state = this.tabState.get(tabId)
|
||||
if (!state) {
|
||||
state = { navigationId: null, snapshotResult: null, debuggerAttached: false }
|
||||
this.tabState.set(tabId, state)
|
||||
}
|
||||
return state
|
||||
}
|
||||
|
||||
private async ensureDebuggerAttached(guest: Electron.WebContents): Promise<void> {
|
||||
const tabId = this.resolveTabId(guest.id)
|
||||
const state = this.getOrCreateTabState(tabId)
|
||||
if (state.debuggerAttached) {
|
||||
return
|
||||
}
|
||||
|
||||
try {
|
||||
guest.debugger.attach('1.3')
|
||||
} catch {
|
||||
throw new BrowserError(
|
||||
'browser_cdp_error',
|
||||
'Could not attach debugger. DevTools may already be open for this tab.'
|
||||
)
|
||||
}
|
||||
|
||||
await this.makeCdpSender(guest)('Page.enable')
|
||||
await this.makeCdpSender(guest)('DOM.enable')
|
||||
|
||||
guest.debugger.on('detach', () => {
|
||||
state.debuggerAttached = false
|
||||
state.snapshotResult = null
|
||||
})
|
||||
|
||||
guest.debugger.on('message', (_event: unknown, method: string) => {
|
||||
if (method === 'Page.frameNavigated') {
|
||||
state.snapshotResult = null
|
||||
state.navigationId = null
|
||||
}
|
||||
})
|
||||
|
||||
state.debuggerAttached = true
|
||||
}
|
||||
|
||||
private makeCdpSender(guest: Electron.WebContents): CdpCommandSender {
|
||||
return (method: string, params?: Record<string, unknown>) => {
|
||||
const command = guest.debugger.sendCommand(method, params) as Promise<unknown>
|
||||
// Why: Electron's CDP sendCommand can hang indefinitely if the debugger
|
||||
// session is stale (e.g. after a renderer process swap that wasn't detected).
|
||||
// A 10s timeout prevents the RPC from blocking until the CLI's socket timeout.
|
||||
return Promise.race([
|
||||
command,
|
||||
new Promise<never>((_, reject) =>
|
||||
setTimeout(
|
||||
() =>
|
||||
reject(new BrowserError('browser_cdp_error', `CDP command "${method}" timed out`)),
|
||||
10_000
|
||||
)
|
||||
)
|
||||
])
|
||||
}
|
||||
}
|
||||
|
||||
private async resolveRef(
|
||||
guest: Electron.WebContents,
|
||||
sender: CdpCommandSender,
|
||||
ref: string
|
||||
): Promise<{ backendDOMNodeId: number; role: string; name: string }> {
|
||||
const tabId = this.resolveTabId(guest.id)
|
||||
const state = this.getOrCreateTabState(tabId)
|
||||
|
||||
if (!state.snapshotResult) {
|
||||
throw new BrowserError(
|
||||
'browser_stale_ref',
|
||||
"No snapshot exists for this tab. Run 'orca snapshot' first."
|
||||
)
|
||||
}
|
||||
|
||||
const entry = state.snapshotResult.refMap.get(ref)
|
||||
if (!entry) {
|
||||
throw new BrowserError(
|
||||
'browser_ref_not_found',
|
||||
`Element ref ${ref} was not found. Run 'orca snapshot' to see available refs.`
|
||||
)
|
||||
}
|
||||
|
||||
const currentNavId = await this.getNavigationId(sender)
|
||||
if (state.navigationId && currentNavId !== state.navigationId) {
|
||||
state.snapshotResult = null
|
||||
state.navigationId = null
|
||||
throw new BrowserError(
|
||||
'browser_stale_ref',
|
||||
"The page has navigated since the last snapshot. Run 'orca snapshot' to get fresh refs."
|
||||
)
|
||||
}
|
||||
|
||||
try {
|
||||
await sender('DOM.describeNode', { backendNodeId: entry.backendDOMNodeId })
|
||||
} catch {
|
||||
state.snapshotResult = null
|
||||
throw new BrowserError(
|
||||
'browser_stale_ref',
|
||||
`Element ${ref} no longer exists in the DOM. Run 'orca snapshot' to get fresh refs.`
|
||||
)
|
||||
}
|
||||
|
||||
return entry
|
||||
}
|
||||
|
||||
private async getNavigationId(sender: CdpCommandSender): Promise<string> {
|
||||
const { entries, currentIndex } = (await sender('Page.getNavigationHistory')) as {
|
||||
entries: { id: number; url: string }[]
|
||||
currentIndex: number
|
||||
}
|
||||
const current = entries[currentIndex]
|
||||
return current ? `${current.id}:${current.url}` : 'unknown'
|
||||
}
|
||||
|
||||
private async getPreviousHistoryEntryId(sender: CdpCommandSender): Promise<number> {
|
||||
const { entries, currentIndex } = (await sender('Page.getNavigationHistory')) as {
|
||||
entries: { id: number }[]
|
||||
currentIndex: number
|
||||
}
|
||||
if (currentIndex <= 0) {
|
||||
throw new BrowserError('browser_navigation_failed', 'No previous history entry.')
|
||||
}
|
||||
return entries[currentIndex - 1].id
|
||||
}
|
||||
|
||||
private async waitForLoad(sender: CdpCommandSender): Promise<void> {
|
||||
await sender('Page.enable')
|
||||
await new Promise<void>((resolve, reject) => {
|
||||
const timeout = setTimeout(() => {
|
||||
reject(new BrowserError('browser_timeout', 'Page load timed out after 30 seconds.'))
|
||||
}, 30_000)
|
||||
|
||||
const check = async (): Promise<void> => {
|
||||
try {
|
||||
const { result } = (await sender('Runtime.evaluate', {
|
||||
expression: 'document.readyState',
|
||||
returnByValue: true
|
||||
})) as { result: { value: string } }
|
||||
if (result.value === 'complete') {
|
||||
clearTimeout(timeout)
|
||||
resolve()
|
||||
} else {
|
||||
setTimeout(check, 100)
|
||||
}
|
||||
} catch {
|
||||
clearTimeout(timeout)
|
||||
reject(new BrowserError('browser_cdp_error', 'Failed to check page load state.'))
|
||||
}
|
||||
}
|
||||
check()
|
||||
})
|
||||
}
|
||||
|
||||
private invalidateRefMap(webContentsId: number): void {
|
||||
const tabId = this.resolveTabIdSafe(webContentsId)
|
||||
if (tabId) {
|
||||
const state = this.tabState.get(tabId)
|
||||
if (state) {
|
||||
state.snapshotResult = null
|
||||
state.navigationId = null
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private async enqueueCommand<T>(execute: () => Promise<T>): Promise<T> {
|
||||
const guest = this.getActiveGuest()
|
||||
const tabId = this.resolveTabId(guest.id)
|
||||
|
||||
return new Promise<T>((resolve, reject) => {
|
||||
let queue = this.commandQueues.get(tabId)
|
||||
if (!queue) {
|
||||
queue = []
|
||||
this.commandQueues.set(tabId, queue)
|
||||
}
|
||||
queue.push({
|
||||
execute: execute as () => Promise<unknown>,
|
||||
resolve: resolve as (value: unknown) => void,
|
||||
reject
|
||||
})
|
||||
this.processQueue(tabId)
|
||||
})
|
||||
}
|
||||
|
||||
private async processQueue(tabId: string): Promise<void> {
|
||||
if (this.processingQueues.has(tabId)) {
|
||||
return
|
||||
}
|
||||
this.processingQueues.add(tabId)
|
||||
|
||||
const queue = this.commandQueues.get(tabId)
|
||||
while (queue && queue.length > 0) {
|
||||
const cmd = queue.shift()!
|
||||
try {
|
||||
const result = await cmd.execute()
|
||||
cmd.resolve(result)
|
||||
} catch (error) {
|
||||
cmd.reject(error)
|
||||
}
|
||||
}
|
||||
|
||||
this.processingQueues.delete(tabId)
|
||||
}
|
||||
}
|
||||
196
src/main/browser/snapshot-engine.test.ts
Normal file
196
src/main/browser/snapshot-engine.test.ts
Normal file
|
|
@ -0,0 +1,196 @@
|
|||
import { describe, expect, it, vi } from 'vitest'
|
||||
import { buildSnapshot, type CdpCommandSender } from './snapshot-engine'
|
||||
|
||||
type AXNode = {
|
||||
nodeId: string
|
||||
backendDOMNodeId?: number
|
||||
role?: { type: string; value: string }
|
||||
name?: { type: string; value: string }
|
||||
properties?: { name: string; value: { type: string; value: unknown } }[]
|
||||
childIds?: string[]
|
||||
ignored?: boolean
|
||||
}
|
||||
|
||||
function makeSender(nodes: AXNode[]): CdpCommandSender {
|
||||
return vi.fn(async (method: string) => {
|
||||
if (method === 'Accessibility.enable') {
|
||||
return {}
|
||||
}
|
||||
if (method === 'Accessibility.getFullAXTree') {
|
||||
return { nodes }
|
||||
}
|
||||
throw new Error(`Unexpected CDP method: ${method}`)
|
||||
})
|
||||
}
|
||||
|
||||
function node(
|
||||
id: string,
|
||||
role: string,
|
||||
name: string,
|
||||
opts?: {
|
||||
childIds?: string[]
|
||||
backendDOMNodeId?: number
|
||||
ignored?: boolean
|
||||
properties?: AXNode['properties']
|
||||
}
|
||||
): AXNode {
|
||||
return {
|
||||
nodeId: id,
|
||||
backendDOMNodeId: opts?.backendDOMNodeId ?? parseInt(id, 10),
|
||||
role: { type: 'role', value: role },
|
||||
name: { type: 'computedString', value: name },
|
||||
childIds: opts?.childIds,
|
||||
ignored: opts?.ignored,
|
||||
properties: opts?.properties
|
||||
}
|
||||
}
|
||||
|
||||
describe('buildSnapshot', () => {
|
||||
it('returns empty snapshot for empty tree', async () => {
|
||||
const result = await buildSnapshot(makeSender([]))
|
||||
expect(result.snapshot).toBe('')
|
||||
expect(result.refs).toEqual([])
|
||||
expect(result.refMap.size).toBe(0)
|
||||
})
|
||||
|
||||
it('assigns refs to interactive elements', async () => {
|
||||
const nodes: AXNode[] = [
|
||||
node('1', 'WebArea', 'page', { childIds: ['2', '3'] }),
|
||||
node('2', 'button', 'Submit', { backendDOMNodeId: 10 }),
|
||||
node('3', 'link', 'Home', { backendDOMNodeId: 11 })
|
||||
]
|
||||
const result = await buildSnapshot(makeSender(nodes))
|
||||
|
||||
expect(result.refs).toHaveLength(2)
|
||||
expect(result.refs[0]).toEqual({ ref: '@e1', role: 'button', name: 'Submit' })
|
||||
expect(result.refs[1]).toEqual({ ref: '@e2', role: 'link', name: 'Home' })
|
||||
expect(result.snapshot).toContain('[@e1] button "Submit"')
|
||||
expect(result.snapshot).toContain('[@e2] link "Home"')
|
||||
})
|
||||
|
||||
it('renders text inputs with friendly role name', async () => {
|
||||
const nodes: AXNode[] = [
|
||||
node('1', 'WebArea', 'page', { childIds: ['2'] }),
|
||||
node('2', 'textbox', 'Email', { backendDOMNodeId: 10 })
|
||||
]
|
||||
const result = await buildSnapshot(makeSender(nodes))
|
||||
expect(result.snapshot).toContain('text input "Email"')
|
||||
})
|
||||
|
||||
it('renders landmarks without refs', async () => {
|
||||
const nodes: AXNode[] = [
|
||||
node('1', 'WebArea', 'page', { childIds: ['2'] }),
|
||||
node('2', 'navigation', 'Main Nav', { childIds: ['3'] }),
|
||||
node('3', 'link', 'About', { backendDOMNodeId: 10 })
|
||||
]
|
||||
const result = await buildSnapshot(makeSender(nodes))
|
||||
|
||||
expect(result.snapshot).toContain('[Main Nav]')
|
||||
expect(result.refs).toHaveLength(1)
|
||||
expect(result.refs[0].name).toBe('About')
|
||||
})
|
||||
|
||||
it('renders headings without refs', async () => {
|
||||
const nodes: AXNode[] = [
|
||||
node('1', 'WebArea', 'page', { childIds: ['2'] }),
|
||||
node('2', 'heading', 'Welcome')
|
||||
]
|
||||
const result = await buildSnapshot(makeSender(nodes))
|
||||
expect(result.snapshot).toContain('heading "Welcome"')
|
||||
expect(result.refs).toHaveLength(0)
|
||||
})
|
||||
|
||||
it('renders static text without refs', async () => {
|
||||
const nodes: AXNode[] = [
|
||||
node('1', 'WebArea', 'page', { childIds: ['2'] }),
|
||||
node('2', 'staticText', 'Hello world')
|
||||
]
|
||||
const result = await buildSnapshot(makeSender(nodes))
|
||||
expect(result.snapshot).toContain('text "Hello world"')
|
||||
expect(result.refs).toHaveLength(0)
|
||||
})
|
||||
|
||||
it('skips generic/none/presentation roles', async () => {
|
||||
const nodes: AXNode[] = [
|
||||
node('1', 'WebArea', 'page', { childIds: ['2'] }),
|
||||
node('2', 'generic', '', { childIds: ['3'] }),
|
||||
node('3', 'button', 'OK', { backendDOMNodeId: 10 })
|
||||
]
|
||||
const result = await buildSnapshot(makeSender(nodes))
|
||||
expect(result.refs).toHaveLength(1)
|
||||
expect(result.refs[0].name).toBe('OK')
|
||||
expect(result.snapshot).not.toContain('generic')
|
||||
})
|
||||
|
||||
it('skips ignored nodes but walks their children', async () => {
|
||||
const nodes: AXNode[] = [
|
||||
node('1', 'WebArea', 'page', { childIds: ['2'] }),
|
||||
node('2', 'group', 'ignored group', { childIds: ['3'], ignored: true }),
|
||||
node('3', 'button', 'Deep', { backendDOMNodeId: 10 })
|
||||
]
|
||||
const result = await buildSnapshot(makeSender(nodes))
|
||||
expect(result.refs).toHaveLength(1)
|
||||
expect(result.refs[0].name).toBe('Deep')
|
||||
})
|
||||
|
||||
it('skips interactive elements without a name', async () => {
|
||||
const nodes: AXNode[] = [
|
||||
node('1', 'WebArea', 'page', { childIds: ['2', '3'] }),
|
||||
node('2', 'button', '', { backendDOMNodeId: 10 }),
|
||||
node('3', 'button', 'Labeled', { backendDOMNodeId: 11 })
|
||||
]
|
||||
const result = await buildSnapshot(makeSender(nodes))
|
||||
expect(result.refs).toHaveLength(1)
|
||||
expect(result.refs[0].name).toBe('Labeled')
|
||||
})
|
||||
|
||||
it('populates refMap with backendDOMNodeId', async () => {
|
||||
const nodes: AXNode[] = [
|
||||
node('1', 'WebArea', 'page', { childIds: ['2'] }),
|
||||
node('2', 'checkbox', 'Agree', { backendDOMNodeId: 42 })
|
||||
]
|
||||
const result = await buildSnapshot(makeSender(nodes))
|
||||
const entry = result.refMap.get('@e1')
|
||||
expect(entry).toBeDefined()
|
||||
expect(entry!.backendDOMNodeId).toBe(42)
|
||||
expect(entry!.role).toBe('checkbox')
|
||||
expect(entry!.name).toBe('Agree')
|
||||
})
|
||||
|
||||
it('indents children under landmarks', async () => {
|
||||
const nodes: AXNode[] = [
|
||||
node('1', 'WebArea', 'page', { childIds: ['2'] }),
|
||||
node('2', 'main', '', { childIds: ['3'] }),
|
||||
node('3', 'button', 'Action', { backendDOMNodeId: 10 })
|
||||
]
|
||||
const result = await buildSnapshot(makeSender(nodes))
|
||||
const lines = result.snapshot.split('\n')
|
||||
const mainLine = lines.find((l) => l.includes('[Main Content]'))
|
||||
const buttonLine = lines.find((l) => l.includes('Action'))
|
||||
expect(mainLine).toBeDefined()
|
||||
expect(buttonLine).toBeDefined()
|
||||
expect(buttonLine!.startsWith(' ')).toBe(true)
|
||||
})
|
||||
|
||||
it('handles a realistic page structure', async () => {
|
||||
const nodes: AXNode[] = [
|
||||
node('1', 'WebArea', 'page', { childIds: ['2', '3', '4'] }),
|
||||
node('2', 'banner', '', { childIds: ['5'] }),
|
||||
node('3', 'main', '', { childIds: ['6', '7', '8'] }),
|
||||
node('4', 'contentinfo', '', {}),
|
||||
node('5', 'link', 'Logo', { backendDOMNodeId: 10 }),
|
||||
node('6', 'heading', 'Dashboard'),
|
||||
node('7', 'textbox', 'Search', { backendDOMNodeId: 20 }),
|
||||
node('8', 'button', 'Go', { backendDOMNodeId: 21 })
|
||||
]
|
||||
const result = await buildSnapshot(makeSender(nodes))
|
||||
|
||||
expect(result.refs).toHaveLength(3)
|
||||
expect(result.refs.map((r) => r.name)).toEqual(['Logo', 'Search', 'Go'])
|
||||
|
||||
expect(result.snapshot).toContain('[Header]')
|
||||
expect(result.snapshot).toContain('[Main Content]')
|
||||
expect(result.snapshot).toContain('[Footer]')
|
||||
expect(result.snapshot).toContain('heading "Dashboard"')
|
||||
})
|
||||
})
|
||||
262
src/main/browser/snapshot-engine.ts
Normal file
262
src/main/browser/snapshot-engine.ts
Normal file
|
|
@ -0,0 +1,262 @@
|
|||
import type { BrowserSnapshotRef } from '../../shared/runtime-types'
|
||||
|
||||
export type CdpCommandSender = (
|
||||
method: string,
|
||||
params?: Record<string, unknown>
|
||||
) => Promise<unknown>
|
||||
|
||||
type AXNode = {
|
||||
nodeId: string
|
||||
backendDOMNodeId?: number
|
||||
role?: { type: string; value: string }
|
||||
name?: { type: string; value: string }
|
||||
properties?: { name: string; value: { type: string; value: unknown } }[]
|
||||
childIds?: string[]
|
||||
ignored?: boolean
|
||||
}
|
||||
|
||||
type SnapshotEntry = {
|
||||
ref: string
|
||||
role: string
|
||||
name: string
|
||||
backendDOMNodeId: number
|
||||
depth: number
|
||||
}
|
||||
|
||||
export type SnapshotResult = {
|
||||
snapshot: string
|
||||
refs: BrowserSnapshotRef[]
|
||||
refMap: Map<string, { backendDOMNodeId: number; role: string; name: string }>
|
||||
}
|
||||
|
||||
const INTERACTIVE_ROLES = new Set([
|
||||
'button',
|
||||
'link',
|
||||
'textbox',
|
||||
'searchbox',
|
||||
'combobox',
|
||||
'checkbox',
|
||||
'radio',
|
||||
'switch',
|
||||
'slider',
|
||||
'spinbutton',
|
||||
'menuitem',
|
||||
'menuitemcheckbox',
|
||||
'menuitemradio',
|
||||
'tab',
|
||||
'option',
|
||||
'treeitem'
|
||||
])
|
||||
|
||||
const LANDMARK_ROLES = new Set([
|
||||
'banner',
|
||||
'navigation',
|
||||
'main',
|
||||
'complementary',
|
||||
'contentinfo',
|
||||
'region',
|
||||
'form',
|
||||
'search'
|
||||
])
|
||||
|
||||
const HEADING_PATTERN = /^heading$/
|
||||
|
||||
const SKIP_ROLES = new Set(['none', 'presentation', 'generic'])
|
||||
|
||||
export async function buildSnapshot(sendCommand: CdpCommandSender): Promise<SnapshotResult> {
|
||||
await sendCommand('Accessibility.enable')
|
||||
const { nodes } = (await sendCommand('Accessibility.getFullAXTree')) as { nodes: AXNode[] }
|
||||
|
||||
const nodeById = new Map<string, AXNode>()
|
||||
for (const node of nodes) {
|
||||
nodeById.set(node.nodeId, node)
|
||||
}
|
||||
|
||||
const entries: SnapshotEntry[] = []
|
||||
let refCounter = 1
|
||||
|
||||
const root = nodes[0]
|
||||
if (!root) {
|
||||
return { snapshot: '', refs: [], refMap: new Map() }
|
||||
}
|
||||
|
||||
walkTree(root, nodeById, 0, entries, () => refCounter++)
|
||||
|
||||
const refMap = new Map<string, { backendDOMNodeId: number; role: string; name: string }>()
|
||||
const refs: BrowserSnapshotRef[] = []
|
||||
const lines: string[] = []
|
||||
|
||||
for (const entry of entries) {
|
||||
const indent = ' '.repeat(entry.depth)
|
||||
if (entry.ref) {
|
||||
lines.push(`${indent}[${entry.ref}] ${entry.role} "${entry.name}"`)
|
||||
refs.push({ ref: entry.ref, role: entry.role, name: entry.name })
|
||||
refMap.set(entry.ref, {
|
||||
backendDOMNodeId: entry.backendDOMNodeId,
|
||||
role: entry.role,
|
||||
name: entry.name
|
||||
})
|
||||
} else {
|
||||
lines.push(`${indent}${entry.role} "${entry.name}"`)
|
||||
}
|
||||
}
|
||||
|
||||
return { snapshot: lines.join('\n'), refs, refMap }
|
||||
}
|
||||
|
||||
function walkTree(
|
||||
node: AXNode,
|
||||
nodeById: Map<string, AXNode>,
|
||||
depth: number,
|
||||
entries: SnapshotEntry[],
|
||||
nextRef: () => number
|
||||
): void {
|
||||
if (node.ignored) {
|
||||
walkChildren(node, nodeById, depth, entries, nextRef)
|
||||
return
|
||||
}
|
||||
|
||||
const role = node.role?.value ?? ''
|
||||
const name = node.name?.value ?? ''
|
||||
|
||||
if (SKIP_ROLES.has(role)) {
|
||||
walkChildren(node, nodeById, depth, entries, nextRef)
|
||||
return
|
||||
}
|
||||
|
||||
const isInteractive = INTERACTIVE_ROLES.has(role)
|
||||
const isHeading = HEADING_PATTERN.test(role)
|
||||
const isLandmark = LANDMARK_ROLES.has(role)
|
||||
const isStaticText = role === 'staticText' || role === 'StaticText'
|
||||
|
||||
if (!isInteractive && !isHeading && !isLandmark && !isStaticText) {
|
||||
walkChildren(node, nodeById, depth, entries, nextRef)
|
||||
return
|
||||
}
|
||||
|
||||
if (!name && !isLandmark) {
|
||||
walkChildren(node, nodeById, depth, entries, nextRef)
|
||||
return
|
||||
}
|
||||
|
||||
const hasFocusable = isInteractive && isFocusable(node)
|
||||
|
||||
if (isLandmark) {
|
||||
entries.push({
|
||||
ref: '',
|
||||
role: formatLandmarkRole(role, name),
|
||||
name: name || role,
|
||||
backendDOMNodeId: node.backendDOMNodeId ?? 0,
|
||||
depth
|
||||
})
|
||||
walkChildren(node, nodeById, depth + 1, entries, nextRef)
|
||||
return
|
||||
}
|
||||
|
||||
if (isHeading) {
|
||||
entries.push({
|
||||
ref: '',
|
||||
role: 'heading',
|
||||
name,
|
||||
backendDOMNodeId: node.backendDOMNodeId ?? 0,
|
||||
depth
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
if (isStaticText && name.trim().length > 0) {
|
||||
entries.push({
|
||||
ref: '',
|
||||
role: 'text',
|
||||
name: name.trim(),
|
||||
backendDOMNodeId: node.backendDOMNodeId ?? 0,
|
||||
depth
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
if (isInteractive && (hasFocusable || node.backendDOMNodeId)) {
|
||||
const ref = `@e${nextRef()}`
|
||||
entries.push({
|
||||
ref,
|
||||
role: formatInteractiveRole(role),
|
||||
name: name || '(unlabeled)',
|
||||
backendDOMNodeId: node.backendDOMNodeId ?? 0,
|
||||
depth
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
walkChildren(node, nodeById, depth, entries, nextRef)
|
||||
}
|
||||
|
||||
function walkChildren(
|
||||
node: AXNode,
|
||||
nodeById: Map<string, AXNode>,
|
||||
depth: number,
|
||||
entries: SnapshotEntry[],
|
||||
nextRef: () => number
|
||||
): void {
|
||||
if (!node.childIds) {
|
||||
return
|
||||
}
|
||||
for (const childId of node.childIds) {
|
||||
const child = nodeById.get(childId)
|
||||
if (child) {
|
||||
walkTree(child, nodeById, depth, entries, nextRef)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
function isFocusable(node: AXNode): boolean {
|
||||
if (!node.properties) {
|
||||
return true
|
||||
}
|
||||
const focusable = node.properties.find((p) => p.name === 'focusable')
|
||||
if (focusable && focusable.value.value === false) {
|
||||
return false
|
||||
}
|
||||
return true
|
||||
}
|
||||
|
||||
function formatInteractiveRole(role: string): string {
|
||||
switch (role) {
|
||||
case 'textbox':
|
||||
case 'searchbox':
|
||||
return 'text input'
|
||||
case 'combobox':
|
||||
return 'combobox'
|
||||
case 'menuitem':
|
||||
case 'menuitemcheckbox':
|
||||
case 'menuitemradio':
|
||||
return 'menu item'
|
||||
case 'spinbutton':
|
||||
return 'number input'
|
||||
case 'treeitem':
|
||||
return 'tree item'
|
||||
default:
|
||||
return role
|
||||
}
|
||||
}
|
||||
|
||||
function formatLandmarkRole(role: string, name: string): string {
|
||||
if (name) {
|
||||
return `[${name}]`
|
||||
}
|
||||
switch (role) {
|
||||
case 'banner':
|
||||
return '[Header]'
|
||||
case 'navigation':
|
||||
return '[Navigation]'
|
||||
case 'main':
|
||||
return '[Main Content]'
|
||||
case 'complementary':
|
||||
return '[Sidebar]'
|
||||
case 'contentinfo':
|
||||
return '[Footer]'
|
||||
case 'search':
|
||||
return '[Search]'
|
||||
default:
|
||||
return `[${role}]`
|
||||
}
|
||||
}
|
||||
|
|
@ -35,6 +35,8 @@ import { CodexAccountService } from './codex-accounts/service'
|
|||
import { CodexRuntimeHomeService } from './codex-accounts/runtime-home-service'
|
||||
import { openCodeHookService } from './opencode/hook-service'
|
||||
import { StarNagService } from './star-nag/service'
|
||||
import { CdpBridge } from './browser/cdp-bridge'
|
||||
import { browserManager } from './browser/browser-manager'
|
||||
|
||||
let mainWindow: BrowserWindow | null = null
|
||||
/** Whether a manual app.quit() (Cmd+Q, etc.) is in progress. Shared with the
|
||||
|
|
@ -158,6 +160,7 @@ app.whenReady().then(async () => {
|
|||
starNag = new StarNagService(store, stats)
|
||||
starNag.start()
|
||||
starNag.registerIpcHandlers()
|
||||
runtime.setCdpBridge(new CdpBridge(browserManager))
|
||||
nativeTheme.themeSource = store.getSettings().theme ?? 'system'
|
||||
registerAppMenu({
|
||||
onCheckForUpdates: () => checkForUpdatesFromMenu(),
|
||||
|
|
|
|||
|
|
@ -2,6 +2,7 @@
|
|||
trust boundary (isTrustedBrowserRenderer) and handler teardown stay consistent. */
|
||||
import { BrowserWindow, dialog, ipcMain } from 'electron'
|
||||
import { browserManager } from '../browser/browser-manager'
|
||||
import type { CdpBridge } from '../browser/cdp-bridge'
|
||||
import { browserSessionRegistry } from '../browser/browser-session-registry'
|
||||
import {
|
||||
pickCookieFile,
|
||||
|
|
@ -28,11 +29,16 @@ import type {
|
|||
} from '../../shared/types'
|
||||
|
||||
let trustedBrowserRendererWebContentsId: number | null = null
|
||||
let cdpBridgeRef: CdpBridge | null = null
|
||||
|
||||
export function setTrustedBrowserRendererWebContentsId(webContentsId: number | null): void {
|
||||
trustedBrowserRendererWebContentsId = webContentsId
|
||||
}
|
||||
|
||||
export function setCdpBridgeRef(bridge: CdpBridge | null): void {
|
||||
cdpBridgeRef = bridge
|
||||
}
|
||||
|
||||
function isTrustedBrowserRenderer(sender: Electron.WebContents): boolean {
|
||||
if (sender.isDestroyed() || sender.getType() !== 'window') {
|
||||
return false
|
||||
|
|
@ -64,6 +70,7 @@ export function registerBrowserHandlers(): void {
|
|||
ipcMain.removeHandler('browser:cancelGrab')
|
||||
ipcMain.removeHandler('browser:captureSelectionScreenshot')
|
||||
ipcMain.removeHandler('browser:extractHoverPayload')
|
||||
ipcMain.removeHandler('browser:activeTabChanged')
|
||||
|
||||
ipcMain.handle(
|
||||
'browser:registerGuest',
|
||||
|
|
@ -71,10 +78,21 @@ export function registerBrowserHandlers(): void {
|
|||
if (!isTrustedBrowserRenderer(event.sender)) {
|
||||
return false
|
||||
}
|
||||
// Why: when Chromium swaps a guest's renderer process (navigation,
|
||||
// crash recovery), the renderer re-registers the same browserPageId
|
||||
// with a new webContentsId. If the CDP bridge was tracking the old
|
||||
// webContentsId as active, update it to the new one so agent commands
|
||||
// don't target a destroyed surface.
|
||||
const previousWcId = browserManager.getGuestWebContentsId(args.browserPageId)
|
||||
browserManager.registerGuest({
|
||||
...args,
|
||||
rendererWebContentsId: event.sender.id
|
||||
})
|
||||
if (cdpBridgeRef && previousWcId !== null && previousWcId !== args.webContentsId) {
|
||||
if (cdpBridgeRef.getActiveWebContentsId() === previousWcId) {
|
||||
cdpBridgeRef.onTabChanged(args.webContentsId)
|
||||
}
|
||||
}
|
||||
return true
|
||||
}
|
||||
)
|
||||
|
|
@ -83,10 +101,34 @@ export function registerBrowserHandlers(): void {
|
|||
if (!isTrustedBrowserRenderer(event.sender)) {
|
||||
return false
|
||||
}
|
||||
// Why: notify CDP bridge before unregistering so it can clean up debugger
|
||||
// state and ref maps for the closing tab. Must happen before unregisterGuest
|
||||
// clears the webContentsId mapping.
|
||||
const wcId = browserManager.getGuestWebContentsId(args.browserPageId)
|
||||
if (wcId !== null && cdpBridgeRef) {
|
||||
cdpBridgeRef.onTabClosed(wcId)
|
||||
}
|
||||
browserManager.unregisterGuest(args.browserPageId)
|
||||
return true
|
||||
})
|
||||
|
||||
// Why: keeps the CDP bridge's active tab in sync with the renderer's UI state.
|
||||
// Without this, a user switching tabs in the UI would leave the agent operating
|
||||
// on the previous tab, which is confusing.
|
||||
ipcMain.handle('browser:activeTabChanged', (event, args: { browserPageId: string }) => {
|
||||
if (!isTrustedBrowserRenderer(event.sender)) {
|
||||
return false
|
||||
}
|
||||
if (!cdpBridgeRef) {
|
||||
return false
|
||||
}
|
||||
const wcId = browserManager.getGuestWebContentsId(args.browserPageId)
|
||||
if (wcId !== null) {
|
||||
cdpBridgeRef.onTabChanged(wcId)
|
||||
}
|
||||
return true
|
||||
})
|
||||
|
||||
ipcMain.handle('browser:openDevTools', (event, args: { browserPageId: string }) => {
|
||||
if (!isTrustedBrowserRenderer(event.sender)) {
|
||||
return false
|
||||
|
|
|
|||
|
|
@ -20,6 +20,7 @@ const {
|
|||
registerUpdaterHandlersMock,
|
||||
registerRateLimitHandlersMock,
|
||||
registerBrowserHandlersMock,
|
||||
setCdpBridgeRefMock,
|
||||
setTrustedBrowserRendererWebContentsIdMock,
|
||||
registerFilesystemWatcherHandlersMock,
|
||||
registerAppHandlersMock
|
||||
|
|
@ -43,6 +44,7 @@ const {
|
|||
registerUpdaterHandlersMock: vi.fn(),
|
||||
registerRateLimitHandlersMock: vi.fn(),
|
||||
registerBrowserHandlersMock: vi.fn(),
|
||||
setCdpBridgeRefMock: vi.fn(),
|
||||
setTrustedBrowserRendererWebContentsIdMock: vi.fn(),
|
||||
registerFilesystemWatcherHandlersMock: vi.fn(),
|
||||
registerAppHandlersMock: vi.fn()
|
||||
|
|
@ -123,7 +125,8 @@ vi.mock('../window/attach-main-window-services', () => ({
|
|||
|
||||
vi.mock('./browser', () => ({
|
||||
registerBrowserHandlers: registerBrowserHandlersMock,
|
||||
setTrustedBrowserRendererWebContentsId: setTrustedBrowserRendererWebContentsIdMock
|
||||
setTrustedBrowserRendererWebContentsId: setTrustedBrowserRendererWebContentsIdMock,
|
||||
setCdpBridgeRef: setCdpBridgeRefMock
|
||||
}))
|
||||
|
||||
vi.mock('./app', () => ({
|
||||
|
|
@ -153,6 +156,7 @@ describe('registerCoreHandlers', () => {
|
|||
registerUpdaterHandlersMock.mockReset()
|
||||
registerRateLimitHandlersMock.mockReset()
|
||||
registerBrowserHandlersMock.mockReset()
|
||||
setCdpBridgeRefMock.mockReset()
|
||||
setTrustedBrowserRendererWebContentsIdMock.mockReset()
|
||||
registerFilesystemWatcherHandlersMock.mockReset()
|
||||
registerAppHandlersMock.mockReset()
|
||||
|
|
@ -160,7 +164,7 @@ describe('registerCoreHandlers', () => {
|
|||
|
||||
it('passes the store through to handler registrars that need it', () => {
|
||||
const store = { marker: 'store' }
|
||||
const runtime = { marker: 'runtime' }
|
||||
const runtime = { marker: 'runtime', getCdpBridge: () => null }
|
||||
const stats = { marker: 'stats' }
|
||||
const claudeUsage = { marker: 'claudeUsage' }
|
||||
const codexUsage = { marker: 'codexUsage' }
|
||||
|
|
@ -204,7 +208,7 @@ describe('registerCoreHandlers', () => {
|
|||
// The first test already called registerCoreHandlers, so the module-level
|
||||
// guard is now set. beforeEach reset all mocks, so call counts are 0.
|
||||
const store2 = { marker: 'store2' }
|
||||
const runtime2 = { marker: 'runtime2' }
|
||||
const runtime2 = { marker: 'runtime2', getCdpBridge: () => null }
|
||||
const stats2 = { marker: 'stats2' }
|
||||
const claudeUsage2 = { marker: 'claudeUsage2' }
|
||||
const codexUsage2 = { marker: 'codexUsage2' }
|
||||
|
|
|
|||
|
|
@ -14,7 +14,7 @@ import { registerStatsHandlers } from './stats'
|
|||
import { registerRateLimitHandlers } from './rate-limits'
|
||||
import { registerRuntimeHandlers } from './runtime'
|
||||
import { registerNotificationHandlers } from './notifications'
|
||||
import { setTrustedBrowserRendererWebContentsId } from './browser'
|
||||
import { setTrustedBrowserRendererWebContentsId, setCdpBridgeRef } from './browser'
|
||||
import { registerSessionHandlers } from './session'
|
||||
import { registerSettingsHandlers } from './settings'
|
||||
import { registerBrowserHandlers } from './browser'
|
||||
|
|
@ -49,6 +49,7 @@ export function registerCoreHandlers(
|
|||
// if a channel is registered twice, so we guard to register only once and
|
||||
// just update the per-window web-contents ID on subsequent calls.
|
||||
setTrustedBrowserRendererWebContentsId(mainWindowWebContentsId)
|
||||
setCdpBridgeRef(runtime.getCdpBridge())
|
||||
if (registered) {
|
||||
return
|
||||
}
|
||||
|
|
|
|||
|
|
@ -23,8 +23,22 @@ import type {
|
|||
RuntimeSyncedLeaf,
|
||||
RuntimeSyncedTab,
|
||||
RuntimeSyncWindowGraph,
|
||||
RuntimeWorktreeListResult
|
||||
RuntimeWorktreeListResult,
|
||||
BrowserSnapshotResult,
|
||||
BrowserClickResult,
|
||||
BrowserGotoResult,
|
||||
BrowserFillResult,
|
||||
BrowserTypeResult,
|
||||
BrowserSelectResult,
|
||||
BrowserScrollResult,
|
||||
BrowserBackResult,
|
||||
BrowserReloadResult,
|
||||
BrowserScreenshotResult,
|
||||
BrowserEvalResult,
|
||||
BrowserTabListResult,
|
||||
BrowserTabSwitchResult
|
||||
} from '../../shared/runtime-types'
|
||||
import type { CdpBridge } from '../browser/cdp-bridge'
|
||||
import { getPRForBranch } from '../github/client'
|
||||
import {
|
||||
getGitUsername,
|
||||
|
|
@ -149,6 +163,7 @@ export class OrcaRuntimeService {
|
|||
private waitersByHandle = new Map<string, Set<TerminalWaiter>>()
|
||||
private ptyController: RuntimePtyController | null = null
|
||||
private notifier: RuntimeNotifier | null = null
|
||||
private cdpBridge: CdpBridge | null = null
|
||||
private resolvedWorktreeCache: ResolvedWorktreeCache | null = null
|
||||
private agentDetector: AgentDetector | null = null
|
||||
|
||||
|
|
@ -189,6 +204,14 @@ export class OrcaRuntimeService {
|
|||
this.notifier = notifier
|
||||
}
|
||||
|
||||
setCdpBridge(bridge: CdpBridge | null): void {
|
||||
this.cdpBridge = bridge
|
||||
}
|
||||
|
||||
getCdpBridge(): CdpBridge | null {
|
||||
return this.cdpBridge
|
||||
}
|
||||
|
||||
attachWindow(windowId: number): void {
|
||||
if (this.authoritativeWindowId === null) {
|
||||
this.authoritativeWindowId = windowId
|
||||
|
|
@ -1109,6 +1132,70 @@ export class OrcaRuntimeService {
|
|||
private getLeafKey(tabId: string, leafId: string): string {
|
||||
return `${tabId}::${leafId}`
|
||||
}
|
||||
|
||||
// ── Browser automation ──
|
||||
|
||||
private requireCdpBridge(): CdpBridge {
|
||||
if (!this.cdpBridge) {
|
||||
throw new Error('runtime_unavailable')
|
||||
}
|
||||
return this.cdpBridge
|
||||
}
|
||||
|
||||
async browserSnapshot(): Promise<BrowserSnapshotResult> {
|
||||
return this.requireCdpBridge().snapshot()
|
||||
}
|
||||
|
||||
async browserClick(params: { element: string }): Promise<BrowserClickResult> {
|
||||
return this.requireCdpBridge().click(params.element)
|
||||
}
|
||||
|
||||
async browserGoto(params: { url: string }): Promise<BrowserGotoResult> {
|
||||
return this.requireCdpBridge().goto(params.url)
|
||||
}
|
||||
|
||||
async browserFill(params: { element: string; value: string }): Promise<BrowserFillResult> {
|
||||
return this.requireCdpBridge().fill(params.element, params.value)
|
||||
}
|
||||
|
||||
async browserType(params: { input: string }): Promise<BrowserTypeResult> {
|
||||
return this.requireCdpBridge().type(params.input)
|
||||
}
|
||||
|
||||
async browserSelect(params: { element: string; value: string }): Promise<BrowserSelectResult> {
|
||||
return this.requireCdpBridge().select(params.element, params.value)
|
||||
}
|
||||
|
||||
async browserScroll(params: {
|
||||
direction: 'up' | 'down'
|
||||
amount?: number
|
||||
}): Promise<BrowserScrollResult> {
|
||||
return this.requireCdpBridge().scroll(params.direction, params.amount)
|
||||
}
|
||||
|
||||
async browserBack(): Promise<BrowserBackResult> {
|
||||
return this.requireCdpBridge().back()
|
||||
}
|
||||
|
||||
async browserReload(): Promise<BrowserReloadResult> {
|
||||
return this.requireCdpBridge().reload()
|
||||
}
|
||||
|
||||
async browserScreenshot(params: { format?: 'png' | 'jpeg' }): Promise<BrowserScreenshotResult> {
|
||||
return this.requireCdpBridge().screenshot(params.format)
|
||||
}
|
||||
|
||||
async browserEval(params: { expression: string }): Promise<BrowserEvalResult> {
|
||||
return this.requireCdpBridge().evaluate(params.expression)
|
||||
}
|
||||
|
||||
browserTabList(): BrowserTabListResult {
|
||||
return this.requireCdpBridge().tabList()
|
||||
}
|
||||
|
||||
async browserTabSwitch(params: { index: number }): Promise<BrowserTabSwitchResult> {
|
||||
return this.requireCdpBridge().tabSwitch(params.index)
|
||||
}
|
||||
}
|
||||
|
||||
const MAX_TAIL_LINES = 120
|
||||
|
|
|
|||
|
|
@ -701,6 +701,189 @@ export class OrcaRuntimeRpcServer {
|
|||
}
|
||||
}
|
||||
|
||||
// ── Browser automation routes ──
|
||||
|
||||
if (request.method === 'browser.snapshot') {
|
||||
try {
|
||||
const result = await this.runtime.browserSnapshot()
|
||||
return this.successResponse(request.id, result)
|
||||
} catch (error) {
|
||||
return this.browserErrorResponse(request.id, error)
|
||||
}
|
||||
}
|
||||
|
||||
if (request.method === 'browser.click') {
|
||||
try {
|
||||
const params = this.extractParams(request)
|
||||
const element = typeof params?.element === 'string' ? params.element : null
|
||||
if (!element) {
|
||||
return this.errorResponse(request.id, 'invalid_argument', 'Missing required --element')
|
||||
}
|
||||
const result = await this.runtime.browserClick({ element })
|
||||
return this.successResponse(request.id, result)
|
||||
} catch (error) {
|
||||
return this.browserErrorResponse(request.id, error)
|
||||
}
|
||||
}
|
||||
|
||||
if (request.method === 'browser.goto') {
|
||||
try {
|
||||
const params = this.extractParams(request)
|
||||
const url = typeof params?.url === 'string' ? params.url : null
|
||||
if (!url) {
|
||||
return this.errorResponse(request.id, 'invalid_argument', 'Missing required --url')
|
||||
}
|
||||
const result = await this.runtime.browserGoto({ url })
|
||||
return this.successResponse(request.id, result)
|
||||
} catch (error) {
|
||||
return this.browserErrorResponse(request.id, error)
|
||||
}
|
||||
}
|
||||
|
||||
if (request.method === 'browser.fill') {
|
||||
try {
|
||||
const params = this.extractParams(request)
|
||||
const element = typeof params?.element === 'string' ? params.element : null
|
||||
const value = typeof params?.value === 'string' ? params.value : null
|
||||
if (!element) {
|
||||
return this.errorResponse(request.id, 'invalid_argument', 'Missing required --element')
|
||||
}
|
||||
if (value === null) {
|
||||
return this.errorResponse(request.id, 'invalid_argument', 'Missing required --value')
|
||||
}
|
||||
const result = await this.runtime.browserFill({ element, value })
|
||||
return this.successResponse(request.id, result)
|
||||
} catch (error) {
|
||||
return this.browserErrorResponse(request.id, error)
|
||||
}
|
||||
}
|
||||
|
||||
if (request.method === 'browser.type') {
|
||||
try {
|
||||
const params = this.extractParams(request)
|
||||
const input = typeof params?.input === 'string' ? params.input : null
|
||||
if (!input) {
|
||||
return this.errorResponse(request.id, 'invalid_argument', 'Missing required --input')
|
||||
}
|
||||
const result = await this.runtime.browserType({ input })
|
||||
return this.successResponse(request.id, result)
|
||||
} catch (error) {
|
||||
return this.browserErrorResponse(request.id, error)
|
||||
}
|
||||
}
|
||||
|
||||
if (request.method === 'browser.select') {
|
||||
try {
|
||||
const params = this.extractParams(request)
|
||||
const element = typeof params?.element === 'string' ? params.element : null
|
||||
const value = typeof params?.value === 'string' ? params.value : null
|
||||
if (!element) {
|
||||
return this.errorResponse(request.id, 'invalid_argument', 'Missing required --element')
|
||||
}
|
||||
if (value === null) {
|
||||
return this.errorResponse(request.id, 'invalid_argument', 'Missing required --value')
|
||||
}
|
||||
const result = await this.runtime.browserSelect({ element, value })
|
||||
return this.successResponse(request.id, result)
|
||||
} catch (error) {
|
||||
return this.browserErrorResponse(request.id, error)
|
||||
}
|
||||
}
|
||||
|
||||
if (request.method === 'browser.scroll') {
|
||||
try {
|
||||
const params = this.extractParams(request)
|
||||
const direction = typeof params?.direction === 'string' ? params.direction : null
|
||||
if (direction !== 'up' && direction !== 'down') {
|
||||
return this.errorResponse(
|
||||
request.id,
|
||||
'invalid_argument',
|
||||
'Missing required --direction (up or down)'
|
||||
)
|
||||
}
|
||||
const amount =
|
||||
typeof params?.amount === 'number' && params.amount > 0 ? params.amount : undefined
|
||||
const result = await this.runtime.browserScroll({ direction, amount })
|
||||
return this.successResponse(request.id, result)
|
||||
} catch (error) {
|
||||
return this.browserErrorResponse(request.id, error)
|
||||
}
|
||||
}
|
||||
|
||||
if (request.method === 'browser.back') {
|
||||
try {
|
||||
const result = await this.runtime.browserBack()
|
||||
return this.successResponse(request.id, result)
|
||||
} catch (error) {
|
||||
return this.browserErrorResponse(request.id, error)
|
||||
}
|
||||
}
|
||||
|
||||
if (request.method === 'browser.reload') {
|
||||
try {
|
||||
const result = await this.runtime.browserReload()
|
||||
return this.successResponse(request.id, result)
|
||||
} catch (error) {
|
||||
return this.browserErrorResponse(request.id, error)
|
||||
}
|
||||
}
|
||||
|
||||
if (request.method === 'browser.screenshot') {
|
||||
try {
|
||||
const params = this.extractParams(request)
|
||||
const format =
|
||||
typeof params?.format === 'string' &&
|
||||
(params.format === 'png' || params.format === 'jpeg')
|
||||
? params.format
|
||||
: undefined
|
||||
const result = await this.runtime.browserScreenshot({ format })
|
||||
return this.successResponse(request.id, result)
|
||||
} catch (error) {
|
||||
return this.browserErrorResponse(request.id, error)
|
||||
}
|
||||
}
|
||||
|
||||
if (request.method === 'browser.eval') {
|
||||
try {
|
||||
const params = this.extractParams(request)
|
||||
const expression = typeof params?.expression === 'string' ? params.expression : null
|
||||
if (!expression) {
|
||||
return this.errorResponse(request.id, 'invalid_argument', 'Missing required --expression')
|
||||
}
|
||||
const result = await this.runtime.browserEval({ expression })
|
||||
return this.successResponse(request.id, result)
|
||||
} catch (error) {
|
||||
return this.browserErrorResponse(request.id, error)
|
||||
}
|
||||
}
|
||||
|
||||
if (request.method === 'browser.tabList') {
|
||||
try {
|
||||
const result = this.runtime.browserTabList()
|
||||
return this.successResponse(request.id, result)
|
||||
} catch (error) {
|
||||
return this.browserErrorResponse(request.id, error)
|
||||
}
|
||||
}
|
||||
|
||||
if (request.method === 'browser.tabSwitch') {
|
||||
try {
|
||||
const params = this.extractParams(request)
|
||||
const index = typeof params?.index === 'number' ? params.index : null
|
||||
if (index === null || !Number.isInteger(index) || index < 0) {
|
||||
return this.errorResponse(
|
||||
request.id,
|
||||
'invalid_argument',
|
||||
'Missing required --index (non-negative integer)'
|
||||
)
|
||||
}
|
||||
const result = await this.runtime.browserTabSwitch({ index })
|
||||
return this.successResponse(request.id, result)
|
||||
} catch (error) {
|
||||
return this.browserErrorResponse(request.id, error)
|
||||
}
|
||||
}
|
||||
|
||||
return this.errorResponse(request.id, 'method_not_found', `Unknown method: ${request.method}`)
|
||||
}
|
||||
|
||||
|
|
@ -718,6 +901,38 @@ export class OrcaRuntimeRpcServer {
|
|||
}
|
||||
}
|
||||
|
||||
private successResponse(id: string, result: unknown): RuntimeRpcResponse {
|
||||
return {
|
||||
id,
|
||||
ok: true,
|
||||
result,
|
||||
_meta: {
|
||||
runtimeId: this.runtime.getRuntimeId()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private extractParams(request: { params?: unknown }): Record<string, unknown> | null {
|
||||
return request.params && typeof request.params === 'object' && request.params !== null
|
||||
? (request.params as Record<string, unknown>)
|
||||
: null
|
||||
}
|
||||
|
||||
// Why: browser errors carry a structured .code property (BrowserError from
|
||||
// cdp-bridge.ts) that maps directly to agent-facing error codes. We forward
|
||||
// that code rather than relying on the message-matching pattern used by
|
||||
// runtimeErrorResponse, which would require adding 10+ entries to the allowlist.
|
||||
private browserErrorResponse(id: string, error: unknown): RuntimeRpcResponse {
|
||||
if (
|
||||
error instanceof Error &&
|
||||
'code' in error &&
|
||||
typeof (error as { code: unknown }).code === 'string'
|
||||
) {
|
||||
return this.errorResponse(id, (error as { code: string }).code, error.message)
|
||||
}
|
||||
return this.runtimeErrorResponse(id, error)
|
||||
}
|
||||
|
||||
private runtimeErrorResponse(id: string, error: unknown): RuntimeRpcResponse {
|
||||
const message = error instanceof Error ? error.message : String(error)
|
||||
if (
|
||||
|
|
|
|||
1
src/preload/api-types.d.ts
vendored
1
src/preload/api-types.d.ts
vendored
|
|
@ -140,6 +140,7 @@ export type BrowserApi = {
|
|||
browserProfile?: string
|
||||
}) => Promise<BrowserCookieImportResult>
|
||||
sessionClearDefaultCookies: () => Promise<boolean>
|
||||
notifyActiveTabChanged: (args: { browserPageId: string }) => Promise<boolean>
|
||||
}
|
||||
|
||||
export type DetectedBrowserProfileInfo = {
|
||||
|
|
|
|||
|
|
@ -748,7 +748,10 @@ const api = {
|
|||
> => ipcRenderer.invoke('browser:session:importFromBrowser', args),
|
||||
|
||||
sessionClearDefaultCookies: (): Promise<boolean> =>
|
||||
ipcRenderer.invoke('browser:session:clearDefaultCookies')
|
||||
ipcRenderer.invoke('browser:session:clearDefaultCookies'),
|
||||
|
||||
notifyActiveTabChanged: (args: { browserPageId: string }): Promise<boolean> =>
|
||||
ipcRenderer.invoke('browser:activeTabChanged', args)
|
||||
},
|
||||
|
||||
hooks: {
|
||||
|
|
|
|||
|
|
@ -595,6 +595,17 @@ export const createBrowserSlice: StateCreator<AppState, [], [], BrowserSlice> =
|
|||
}
|
||||
})
|
||||
|
||||
// Why: notify the CDP bridge which guest webContents is now active so
|
||||
// subsequent agent commands (snapshot, click, etc.) target the correct tab.
|
||||
// registerGuest uses page IDs (not workspace IDs), so we resolve the active
|
||||
// page within the workspace to find the correct browserPageId.
|
||||
const workspace = findWorkspace(get().browserTabsByWorktree, tabId)
|
||||
if (workspace?.activePageId && typeof window !== 'undefined' && window.api?.browser) {
|
||||
window.api.browser
|
||||
.notifyActiveTabChanged({ browserPageId: workspace.activePageId })
|
||||
.catch(() => {})
|
||||
}
|
||||
|
||||
const item = Object.values(get().unifiedTabsByWorktree)
|
||||
.flat()
|
||||
.find((entry) => entry.contentType === 'browser' && entry.entityId === tabId)
|
||||
|
|
@ -796,6 +807,12 @@ export const createBrowserSlice: StateCreator<AppState, [], [], BrowserSlice> =
|
|||
}
|
||||
})
|
||||
|
||||
// Why: switching the active page within a workspace changes which guest
|
||||
// webContents the CDP bridge should target for agent commands.
|
||||
if (typeof window !== 'undefined' && window.api?.browser) {
|
||||
window.api.browser.notifyActiveTabChanged({ browserPageId: pageId }).catch(() => {})
|
||||
}
|
||||
|
||||
const workspace = findWorkspace(get().browserTabsByWorktree, workspaceId)
|
||||
if (!workspace) {
|
||||
return
|
||||
|
|
|
|||
|
|
@ -152,3 +152,89 @@ export type RuntimeWorktreeListResult = {
|
|||
totalCount: number
|
||||
truncated: boolean
|
||||
}
|
||||
|
||||
// ── Browser automation types ──
|
||||
|
||||
export type BrowserSnapshotRef = {
|
||||
ref: string
|
||||
role: string
|
||||
name: string
|
||||
}
|
||||
|
||||
export type BrowserSnapshotResult = {
|
||||
snapshot: string
|
||||
refs: BrowserSnapshotRef[]
|
||||
url: string
|
||||
title: string
|
||||
}
|
||||
|
||||
export type BrowserClickResult = {
|
||||
clicked: string
|
||||
}
|
||||
|
||||
export type BrowserGotoResult = {
|
||||
url: string
|
||||
title: string
|
||||
}
|
||||
|
||||
export type BrowserFillResult = {
|
||||
filled: string
|
||||
}
|
||||
|
||||
export type BrowserTypeResult = {
|
||||
typed: boolean
|
||||
}
|
||||
|
||||
export type BrowserSelectResult = {
|
||||
selected: string
|
||||
}
|
||||
|
||||
export type BrowserScrollResult = {
|
||||
scrolled: 'up' | 'down'
|
||||
}
|
||||
|
||||
export type BrowserBackResult = {
|
||||
url: string
|
||||
title: string
|
||||
}
|
||||
|
||||
export type BrowserReloadResult = {
|
||||
url: string
|
||||
title: string
|
||||
}
|
||||
|
||||
export type BrowserScreenshotResult = {
|
||||
data: string
|
||||
format: 'png' | 'jpeg'
|
||||
}
|
||||
|
||||
export type BrowserEvalResult = {
|
||||
value: string
|
||||
}
|
||||
|
||||
export type BrowserTabInfo = {
|
||||
index: number
|
||||
url: string
|
||||
title: string
|
||||
active: boolean
|
||||
}
|
||||
|
||||
export type BrowserTabListResult = {
|
||||
tabs: BrowserTabInfo[]
|
||||
}
|
||||
|
||||
export type BrowserTabSwitchResult = {
|
||||
switched: number
|
||||
}
|
||||
|
||||
export type BrowserErrorCode =
|
||||
| 'browser_no_tab'
|
||||
| 'browser_tab_not_found'
|
||||
| 'browser_stale_ref'
|
||||
| 'browser_ref_not_found'
|
||||
| 'browser_navigation_failed'
|
||||
| 'browser_element_not_interactable'
|
||||
| 'browser_eval_error'
|
||||
| 'browser_cdp_error'
|
||||
| 'browser_debugger_detached'
|
||||
| 'browser_timeout'
|
||||
|
|
|
|||
Loading…
Reference in a new issue