checkpoint: browser automation CDP bridge with CLI commands

Working: snapshot, click, goto, fill, type, select, scroll, back, reload,
screenshot, eval, tab list, tab switch. Includes stale webContentsId fix,
CLI dev-mode support (ORCA_USER_DATA_PATH), CDP command timeout, and
0-based tab index fix.
This commit is contained in:
Jinwoo-H 2026-04-18 15:04:32 -04:00
parent 3bbe9ed712
commit 99decd7f28
20 changed files with 2489 additions and 13 deletions

View file

@ -0,0 +1,138 @@
---
name: orca-browser
description: >
Use the Orca browser commands to automate the built-in browser.
Triggers: "click on", "fill the form", "take a screenshot",
"navigate to", "interact with the page", "extract text from",
"snapshot the page", or any task involving browser automation.
allowed-tools: Bash(orca:*)
---
# Orca Browser Automation
Use these commands when the agent needs to interact with the built-in Orca browser — navigating pages, reading page content, clicking elements, filling forms, or verifying UI state.
## Core Loop
The browser automation workflow follows a snapshot-interact-re-snapshot loop:
1. **Snapshot** the page to see interactive elements and their refs.
2. **Interact** using refs (`@e1`, `@e3`, etc.) to click, fill, or select.
3. **Re-snapshot** after interactions to see the updated page state.
```bash
orca goto --url https://example.com --json
orca snapshot --json
# Read the refs from the snapshot output
orca click --element @e3 --json
orca snapshot --json
```
## Element Refs
Refs like `@e1`, `@e5` are short identifiers assigned to interactive page elements during a snapshot. They are:
- **Assigned by snapshot**: Run `orca snapshot` to get current refs.
- **Scoped to one tab**: Refs from one tab are not valid in another.
- **Invalidated by navigation**: If the page navigates after a snapshot, refs become stale. Re-snapshot to get fresh refs.
- **Invalidated by tab switch**: Switching tabs with `orca tab switch` invalidates refs. Re-snapshot after switching.
If a ref is stale, the command returns `browser_stale_ref` — re-snapshot and retry.
## Commands
### Navigation
```bash
orca goto --url <url> [--json] # Navigate to URL, waits for page load
orca back [--json] # Go back in browser history
orca reload [--json] # Reload the current page
```
### Observation
```bash
orca snapshot [--json] # Accessibility tree snapshot with element refs
orca screenshot [--format <png|jpeg>] [--json] # Viewport screenshot (base64)
```
### Interaction
```bash
orca click --element <ref> [--json] # Click an element by ref
orca fill --element <ref> --value <text> [--json] # Clear and fill an input
orca type --input <text> [--json] # Type at current focus (no element targeting)
orca select --element <ref> --value <value> [--json] # Select dropdown option
orca scroll --direction <up|down> [--amount <pixels>] [--json] # Scroll viewport
```
### Tab Management
```bash
orca tab list [--json] # List open browser tabs
orca tab switch --index <n> [--json] # Switch active tab (invalidates refs)
```
### Page Inspection
```bash
orca eval --expression <js> [--json] # Evaluate JS in page context
```
## `fill` vs `type`
- **`fill`** targets a specific element by ref, clears its value first, then enters text. Use for form fields.
- **`type`** types at whatever currently has focus. Use for search boxes or after clicking into an input.
## Error Codes and Recovery
| Error Code | Meaning | Recovery |
|-----------|---------|----------|
| `browser_no_tab` | No browser tab is open | Open a tab in the Orca UI, or use `orca tab list` to check |
| `browser_stale_ref` | Ref is invalid (page changed since snapshot) | Run `orca snapshot` to get fresh refs |
| `browser_ref_not_found` | Ref was never assigned (typo or out of range) | Run `orca snapshot` to see available refs |
| `browser_tab_not_found` | Tab index does not exist | Run `orca tab list` to see available tabs |
| `browser_navigation_failed` | URL could not be loaded | Check URL spelling, network connectivity |
| `browser_element_not_interactable` | Element is hidden or disabled | Re-snapshot; the element may have changed state |
| `browser_eval_error` | JavaScript threw an exception | Fix the expression and retry |
| `browser_cdp_error` | Internal browser control error | DevTools may be open — close them and retry |
| `browser_debugger_detached` | Tab was closed | Run `orca tab list` to find remaining tabs |
| `browser_timeout` | Operation timed out | Page may be slow to load; retry or check network |
## Worked Example
Agent fills a login form and verifies the dashboard loads:
```bash
# Navigate to the login page
orca goto --url https://app.example.com/login --json
# See what's on the page
orca snapshot --json
# Output includes:
# [@e1] text input "Email"
# [@e2] text input "Password"
# [@e3] button "Sign In"
# Fill the form
orca fill --element @e1 --value "user@example.com" --json
orca fill --element @e2 --value "s3cret" --json
# Submit
orca click --element @e3 --json
# Verify the dashboard loaded
orca snapshot --json
# Output should show dashboard content, not the login form
```
## Agent Guidance
- Always use `--json` for machine-driven use.
- Always snapshot before interacting with elements.
- After navigation (`goto`, `back`, `reload`, clicking a link), re-snapshot to get fresh refs.
- After switching tabs, re-snapshot.
- If you get `browser_stale_ref`, re-snapshot and retry with the new refs.
- Use `orca tab list` before `orca tab switch` to know which tabs exist.
- Use `orca eval` as an escape hatch for interactions not covered by other commands.
- For full IDE/worktree/terminal commands, see the `orca-cli` skill.

View file

@ -167,6 +167,14 @@ Why: terminal handles are runtime-scoped and may go stale after reloads. If Orca
- If the user asks for CLI UX feedback, test the public `orca` command first. Only inspect `src/cli` or use `node out/cli/index.js` if the public command is missing or the task is explicitly about implementation internals.
- If a command fails, prefer retrying with the public `orca` command before concluding the CLI is broken, unless the failure already came from `orca` itself.
## Browser Commands
`orca` also supports browser automation commands for driving the built-in Orca browser. The core loop is: snapshot the page to get element refs → interact using refs → re-snapshot to see the updated state.
Key commands: `orca snapshot`, `orca click --element @e3`, `orca fill --element @e5 --value "hello"`, `orca goto --url <url>`, `orca tab list`, `orca tab switch --index <n>`.
For the full browser command reference, error codes, and worked examples, see the `orca-browser` skill.
## Important Constraints
- Orca CLI only talks to a running Orca editor.

View file

@ -35,7 +35,23 @@ vi.mock('./runtime-client', () => {
}
})
import { buildCurrentWorktreeSelector, main, normalizeWorktreeSelector } from './index'
import {
buildCurrentWorktreeSelector,
COMMAND_SPECS,
main,
normalizeWorktreeSelector
} from './index'
describe('COMMAND_SPECS collision check', () => {
it('has no duplicate command paths', () => {
const seen = new Set<string>()
for (const spec of COMMAND_SPECS) {
const key = spec.path.join(' ')
expect(seen.has(key), `Duplicate COMMAND_SPECS path: "${key}"`).toBe(false)
seen.add(key)
}
})
})
describe('orca cli worktree awareness', () => {
beforeEach(() => {

View file

@ -13,7 +13,20 @@ import type {
RuntimeTerminalListResult,
RuntimeTerminalShow,
RuntimeTerminalSend,
RuntimeTerminalWait
RuntimeTerminalWait,
BrowserSnapshotResult,
BrowserClickResult,
BrowserGotoResult,
BrowserFillResult,
BrowserTypeResult,
BrowserSelectResult,
BrowserScrollResult,
BrowserBackResult,
BrowserReloadResult,
BrowserScreenshotResult,
BrowserEvalResult,
BrowserTabListResult,
BrowserTabSwitchResult
} from '../shared/runtime-types'
import {
RuntimeClient,
@ -39,7 +52,7 @@ type CommandSpec = {
const DEFAULT_TERMINAL_WAIT_RPC_TIMEOUT_MS = 5 * 60 * 1000
const GLOBAL_FLAGS = ['help', 'json']
const COMMAND_SPECS: CommandSpec[] = [
export const COMMAND_SPECS: CommandSpec[] = [
{
path: ['open'],
summary: 'Launch Orca and wait for the runtime to be reachable',
@ -169,6 +182,85 @@ const COMMAND_SPECS: CommandSpec[] = [
summary: 'Stop terminals for a worktree',
usage: 'orca terminal stop --worktree <selector> [--json]',
allowedFlags: [...GLOBAL_FLAGS, 'worktree']
},
// ── Browser automation ──
{
path: ['snapshot'],
summary: 'Capture an accessibility snapshot of the active browser tab',
usage: 'orca snapshot [--json]',
allowedFlags: [...GLOBAL_FLAGS]
},
{
path: ['screenshot'],
summary: 'Capture a viewport screenshot of the active browser tab',
usage: 'orca screenshot [--format <png|jpeg>] [--json]',
allowedFlags: [...GLOBAL_FLAGS, 'format']
},
{
path: ['click'],
summary: 'Click a browser element by ref',
usage: 'orca click --element <ref> [--json]',
allowedFlags: [...GLOBAL_FLAGS, 'element']
},
{
path: ['fill'],
summary: 'Clear and fill a browser input by ref',
usage: 'orca fill --element <ref> --value <text> [--json]',
allowedFlags: [...GLOBAL_FLAGS, 'element', 'value']
},
{
path: ['type'],
summary: 'Type text at the current browser focus',
usage: 'orca type --input <text> [--json]',
allowedFlags: [...GLOBAL_FLAGS, 'input']
},
{
path: ['select'],
summary: 'Select a dropdown option by ref',
usage: 'orca select --element <ref> --value <value> [--json]',
allowedFlags: [...GLOBAL_FLAGS, 'element', 'value']
},
{
path: ['scroll'],
summary: 'Scroll the browser viewport',
usage: 'orca scroll --direction <up|down> [--amount <pixels>] [--json]',
allowedFlags: [...GLOBAL_FLAGS, 'direction', 'amount']
},
{
path: ['goto'],
summary: 'Navigate the active browser tab to a URL',
usage: 'orca goto --url <url> [--json]',
allowedFlags: [...GLOBAL_FLAGS, 'url']
},
{
path: ['back'],
summary: 'Navigate back in browser history',
usage: 'orca back [--json]',
allowedFlags: [...GLOBAL_FLAGS]
},
{
path: ['reload'],
summary: 'Reload the active browser tab',
usage: 'orca reload [--json]',
allowedFlags: [...GLOBAL_FLAGS]
},
{
path: ['eval'],
summary: 'Evaluate JavaScript in the browser page context',
usage: 'orca eval --expression <js> [--json]',
allowedFlags: [...GLOBAL_FLAGS, 'expression']
},
{
path: ['tab', 'list'],
summary: 'List open browser tabs',
usage: 'orca tab list [--json]',
allowedFlags: [...GLOBAL_FLAGS]
},
{
path: ['tab', 'switch'],
summary: 'Switch the active browser tab',
usage: 'orca tab switch --index <n> [--json]',
allowedFlags: [...GLOBAL_FLAGS, 'index']
}
]
@ -362,6 +454,96 @@ export async function main(argv = process.argv.slice(2), cwd = process.cwd()): P
return printResult(result, json, (value) => `removed: ${value.removed}`)
}
// ── Browser automation dispatch ──
if (matches(commandPath, ['snapshot'])) {
const result = await client.call<BrowserSnapshotResult>('browser.snapshot')
return printResult(result, json, formatSnapshot)
}
if (matches(commandPath, ['screenshot'])) {
const format = getOptionalStringFlag(parsed.flags, 'format')
const result = await client.call<BrowserScreenshotResult>('browser.screenshot', {
format: format === 'jpeg' ? 'jpeg' : undefined
})
return printResult(result, json, formatScreenshot)
}
if (matches(commandPath, ['click'])) {
const element = getRequiredStringFlag(parsed.flags, 'element')
const result = await client.call<BrowserClickResult>('browser.click', { element })
return printResult(result, json, (v) => `Clicked ${v.clicked}`)
}
if (matches(commandPath, ['fill'])) {
const element = getRequiredStringFlag(parsed.flags, 'element')
const value = getRequiredStringFlag(parsed.flags, 'value')
const result = await client.call<BrowserFillResult>('browser.fill', { element, value })
return printResult(result, json, (v) => `Filled ${v.filled}`)
}
if (matches(commandPath, ['type'])) {
const input = getRequiredStringFlag(parsed.flags, 'input')
const result = await client.call<BrowserTypeResult>('browser.type', { input })
return printResult(result, json, () => 'Typed input')
}
if (matches(commandPath, ['select'])) {
const element = getRequiredStringFlag(parsed.flags, 'element')
const value = getRequiredStringFlag(parsed.flags, 'value')
const result = await client.call<BrowserSelectResult>('browser.select', { element, value })
return printResult(result, json, (v) => `Selected ${v.selected}`)
}
if (matches(commandPath, ['scroll'])) {
const direction = getRequiredStringFlag(parsed.flags, 'direction')
if (direction !== 'up' && direction !== 'down') {
throw new RuntimeClientError('invalid_argument', '--direction must be "up" or "down"')
}
const amount = getOptionalPositiveIntegerFlag(parsed.flags, 'amount')
const result = await client.call<BrowserScrollResult>('browser.scroll', {
direction,
amount
})
return printResult(result, json, (v) => `Scrolled ${v.scrolled}`)
}
if (matches(commandPath, ['goto'])) {
const url = getRequiredStringFlag(parsed.flags, 'url')
const result = await client.call<BrowserGotoResult>('browser.goto', { url })
return printResult(result, json, (v) => `Navigated to ${v.url}${v.title}`)
}
if (matches(commandPath, ['back'])) {
const result = await client.call<BrowserBackResult>('browser.back')
return printResult(result, json, (v) => `Back to ${v.url}${v.title}`)
}
if (matches(commandPath, ['reload'])) {
const result = await client.call<BrowserReloadResult>('browser.reload')
return printResult(result, json, (v) => `Reloaded ${v.url}${v.title}`)
}
if (matches(commandPath, ['eval'])) {
const expression = getRequiredStringFlag(parsed.flags, 'expression')
const result = await client.call<BrowserEvalResult>('browser.eval', { expression })
return printResult(result, json, (v) => v.value)
}
if (matches(commandPath, ['tab', 'list'])) {
const result = await client.call<BrowserTabListResult>('browser.tabList')
return printResult(result, json, formatTabList)
}
if (matches(commandPath, ['tab', 'switch'])) {
const index = getOptionalNonNegativeIntegerFlag(parsed.flags, 'index')
if (index === undefined) {
throw new RuntimeClientError('invalid_argument', 'Missing required --index')
}
const result = await client.call<BrowserTabSwitchResult>('browser.tabSwitch', { index })
return printResult(result, json, (v) => `Switched to tab ${v.switched}`)
}
throw new RuntimeClientError('invalid_argument', `Unknown command: ${commandPath.join(' ')}`)
} catch (error) {
if (json) {
@ -446,7 +628,9 @@ export function findCommandSpec(commandPath: string[]): CommandSpec | undefined
}
function isCommandGroup(commandPath: string[]): boolean {
return commandPath.length === 1 && ['repo', 'worktree', 'terminal'].includes(commandPath[0])
return (
commandPath.length === 1 && ['repo', 'worktree', 'terminal', 'tab'].includes(commandPath[0])
)
}
function getRequiredStringFlag(flags: Map<string, string | boolean>, name: string): string {
@ -562,6 +746,20 @@ function getOptionalPositiveIntegerFlag(
return value
}
function getOptionalNonNegativeIntegerFlag(
flags: Map<string, string | boolean>,
name: string
): number | undefined {
const value = getOptionalNumberFlag(flags, name)
if (value === undefined) {
return undefined
}
if (!Number.isInteger(value) || value < 0) {
throw new RuntimeClientError('invalid_argument', `Invalid non-negative integer for --${name}`)
}
return value
}
function getOptionalNullableNumberFlag(
flags: Map<string, string | boolean>,
name: string
@ -737,6 +935,27 @@ function formatWorktreeShow(result: { worktree: RuntimeWorktreeRecord }): string
.join('\n')
}
function formatSnapshot(result: BrowserSnapshotResult): string {
const header = `${result.title}${result.url}\n`
return header + result.snapshot
}
function formatScreenshot(result: BrowserScreenshotResult): string {
return `Screenshot captured (${result.format}, ${Math.round(result.data.length * 0.75)} bytes)`
}
function formatTabList(result: BrowserTabListResult): string {
if (result.tabs.length === 0) {
return 'No browser tabs open.'
}
return result.tabs
.map((t) => {
const marker = t.active ? '* ' : ' '
return `${marker}[${t.index}] ${t.title}${t.url}`
})
.join('\n')
}
function printHelp(commandPath: string[] = []): void {
const exactSpec = findCommandSpec(commandPath)
if (exactSpec) {
@ -785,6 +1004,21 @@ Terminals:
terminal wait Wait for a terminal condition
terminal stop Stop terminals for a worktree
Browser:
snapshot Capture an accessibility snapshot of the active browser tab
screenshot Capture a viewport screenshot of the active browser tab
click Click a browser element by ref
fill Clear and fill a browser input by ref
type Type text at the current browser focus
select Select a dropdown option by ref
scroll Scroll the browser viewport
goto Navigate the active browser tab to a URL
back Navigate back in browser history
reload Reload the active browser tab
eval Evaluate JavaScript in the browser page context
tab list List open browser tabs
tab switch Switch the active browser tab
Common Commands:
orca open [--json]
orca status [--json]
@ -840,7 +1074,12 @@ Examples:
$ orca worktree ps --limit 10
$ orca terminal list --worktree path:/Users/me/orca/workspaces/orca/cli-test-1 --json
$ orca terminal send --terminal term_123 --text "hi" --enter
$ orca terminal wait --terminal term_123 --for exit --timeout-ms 60000 --json`)
$ orca terminal wait --terminal term_123 --for exit --timeout-ms 60000 --json
$ orca goto --url https://example.com
$ orca snapshot
$ orca click --element @e3
$ orca fill --element @e5 --value "hello"
$ orca tab list --json`)
}
function formatCommandHelp(spec: CommandSpec): string {
@ -902,7 +1141,17 @@ function formatFlagHelp(flag: string): string {
text: '--text <text> Text to send to the terminal',
'timeout-ms': '--timeout-ms <ms> Maximum wait time before timing out',
worktree:
'--worktree <selector> Worktree selector such as id:<id>, branch:<branch>, issue:<number>, path:<path>, or active/current'
'--worktree <selector> Worktree selector such as id:<id>, branch:<branch>, issue:<number>, path:<path>, or active/current',
// Browser automation flags
element: '--element <ref> Element ref from snapshot (e.g. @e3)',
url: '--url <url> URL to navigate to',
value: '--value <text> Value to fill or select',
input: '--input <text> Text to type at current focus',
expression: '--expression <js> JavaScript expression to evaluate',
direction: '--direction <up|down> Scroll direction',
amount: '--amount <pixels> Scroll distance in pixels',
index: '--index <n> Tab index to switch to',
format: '--format <png|jpeg> Screenshot image format'
}
return helpByFlag[flag] ?? `--${flag}`

View file

@ -383,6 +383,12 @@ export function getDefaultUserDataPath(
platform: NodeJS.Platform = process.platform,
homeDir = homedir()
): string {
// Why: in dev mode, the Electron app writes runtime metadata to `orca-dev`
// instead of `orca` to avoid clobbering the production app's metadata. The
// CLI needs to find the same metadata file, so respect this env var override.
if (process.env.ORCA_USER_DATA_PATH) {
return process.env.ORCA_USER_DATA_PATH
}
if (platform === 'darwin') {
return join(homeDir, 'Library', 'Application Support', 'orca')
}

View file

@ -71,7 +71,7 @@ function safeOrigin(rawUrl: string): string {
}
}
class BrowserManager {
export class BrowserManager {
private readonly webContentsIdByTabId = new Map<string, number>()
// Why: reverse map enables O(1) guest→tab lookups instead of O(N) linear
// scans on every mouse event, load failure, permission, and popup event.

View file

@ -0,0 +1,504 @@
/* eslint-disable max-lines -- Why: integration test covering the full browser automation pipeline end-to-end. */
import { mkdtempSync } from 'fs'
import { tmpdir } from 'os'
import { join } from 'path'
import { createConnection } from 'net'
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest'
// ── Electron mocks ──
const { webContentsFromIdMock } = vi.hoisted(() => ({
webContentsFromIdMock: vi.fn()
}))
vi.mock('electron', () => ({
webContents: { fromId: webContentsFromIdMock },
shell: { openExternal: vi.fn() },
ipcMain: { handle: vi.fn(), removeHandler: vi.fn(), on: vi.fn() },
app: { getPath: vi.fn(() => '/tmp'), isPackaged: false }
}))
vi.mock('../git/worktree', () => ({
listWorktrees: vi.fn().mockResolvedValue([])
}))
import { BrowserManager } from './browser-manager'
import { CdpBridge } from './cdp-bridge'
import { OrcaRuntimeService } from '../runtime/orca-runtime'
import { OrcaRuntimeRpcServer } from '../runtime/runtime-rpc'
import { readRuntimeMetadata } from '../runtime/runtime-metadata'
// ── CDP response builders ──
type AXNode = {
nodeId: string
backendDOMNodeId?: number
role?: { type: string; value: string }
name?: { type: string; value: string }
properties?: { name: string; value: { type: string; value: unknown } }[]
childIds?: string[]
ignored?: boolean
}
function axNode(
id: string,
role: string,
name: string,
opts?: { childIds?: string[]; backendDOMNodeId?: number }
): AXNode {
return {
nodeId: id,
backendDOMNodeId: opts?.backendDOMNodeId ?? parseInt(id, 10) * 100,
role: { type: 'role', value: role },
name: { type: 'computedString', value: name },
childIds: opts?.childIds
}
}
const EXAMPLE_COM_TREE: AXNode[] = [
axNode('1', 'WebArea', 'Example Domain', { childIds: ['2', '3', '4'] }),
axNode('2', 'heading', 'Example Domain'),
axNode('3', 'staticText', 'This domain is for use in illustrative examples.'),
axNode('4', 'link', 'More information...', { backendDOMNodeId: 400 })
]
const SEARCH_PAGE_TREE: AXNode[] = [
axNode('1', 'WebArea', 'Search', { childIds: ['2', '3', '4', '5'] }),
axNode('2', 'navigation', 'Main Nav', { childIds: ['3'] }),
axNode('3', 'link', 'Home', { backendDOMNodeId: 300 }),
axNode('4', 'textbox', 'Search query', { backendDOMNodeId: 400 }),
axNode('5', 'button', 'Search', { backendDOMNodeId: 500 })
]
// ── Mock WebContents factory ──
function createMockGuest(id: number, url: string, title: string) {
let currentUrl = url
let currentTitle = title
let currentTree = EXAMPLE_COM_TREE
let navHistoryId = 1
const sendCommandMock = vi.fn(async (method: string, params?: Record<string, unknown>) => {
switch (method) {
case 'Page.enable':
case 'DOM.enable':
case 'Accessibility.enable':
return {}
case 'Accessibility.getFullAXTree':
return { nodes: currentTree }
case 'Page.getNavigationHistory':
return {
entries: [{ id: navHistoryId, url: currentUrl }],
currentIndex: 0
}
case 'Page.navigate': {
const targetUrl = (params as { url: string }).url
if (targetUrl.includes('nonexistent.invalid')) {
return { errorText: 'net::ERR_NAME_NOT_RESOLVED' }
}
navHistoryId++
currentUrl = targetUrl
if (targetUrl.includes('search.example.com')) {
currentTitle = 'Search'
currentTree = SEARCH_PAGE_TREE
} else {
currentTitle = 'Example Domain'
currentTree = EXAMPLE_COM_TREE
}
return {}
}
case 'Runtime.evaluate': {
const expr = (params as { expression: string }).expression
if (expr === 'document.readyState') {
return { result: { value: 'complete' } }
}
if (expr.includes('innerWidth')) {
return { result: { value: JSON.stringify({ w: 1280, h: 720 }) } }
}
// eslint-disable-next-line no-eval
return { result: { value: String(eval(expr)), type: 'string' } }
}
case 'DOM.scrollIntoViewIfNeeded':
return {}
case 'DOM.getBoxModel':
return { model: { content: [100, 200, 300, 200, 300, 250, 100, 250] } }
case 'Input.dispatchMouseEvent':
return {}
case 'Input.insertText':
return {}
case 'Input.dispatchKeyEvent':
return {}
case 'DOM.focus':
return {}
case 'DOM.describeNode':
return { node: { nodeId: 1 } }
case 'DOM.requestNode':
return { nodeId: 1 }
case 'DOM.resolveNode':
return { object: { objectId: 'obj-1' } }
case 'Runtime.callFunctionOn':
return { result: { value: undefined } }
case 'Page.captureScreenshot':
return {
data: 'iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg=='
}
case 'Page.reload':
return {}
default:
throw new Error(`Unexpected CDP method: ${method}`)
}
})
const debuggerListeners = new Map<string, ((...args: unknown[]) => void)[]>()
const guest = {
id,
isDestroyed: vi.fn(() => false),
getType: vi.fn(() => 'webview'),
getURL: vi.fn(() => currentUrl),
getTitle: vi.fn(() => currentTitle),
setBackgroundThrottling: vi.fn(),
setWindowOpenHandler: vi.fn(),
on: vi.fn(),
off: vi.fn(),
debugger: {
attach: vi.fn(),
detach: vi.fn(),
sendCommand: sendCommandMock,
on: vi.fn((event: string, handler: (...args: unknown[]) => void) => {
const handlers = debuggerListeners.get(event) ?? []
handlers.push(handler)
debuggerListeners.set(event, handlers)
}),
off: vi.fn()
}
}
return { guest, sendCommandMock }
}
// ── RPC helper ──
async function sendRequest(
endpoint: string,
request: Record<string, unknown>
): Promise<Record<string, unknown>> {
return await new Promise((resolve, reject) => {
const socket = createConnection(endpoint)
let buffer = ''
socket.setEncoding('utf8')
socket.once('error', reject)
socket.on('data', (chunk) => {
buffer += chunk
const newlineIndex = buffer.indexOf('\n')
if (newlineIndex === -1) {
return
}
const message = buffer.slice(0, newlineIndex)
socket.end()
resolve(JSON.parse(message) as Record<string, unknown>)
})
socket.on('connect', () => {
socket.write(`${JSON.stringify(request)}\n`)
})
})
}
// ── Tests ──
describe('Browser automation pipeline (integration)', () => {
let server: OrcaRuntimeRpcServer
let endpoint: string
let authToken: string
const GUEST_WC_ID = 5001
const RENDERER_WC_ID = 1
beforeEach(async () => {
const { guest } = createMockGuest(GUEST_WC_ID, 'https://example.com', 'Example Domain')
webContentsFromIdMock.mockImplementation((id: number) => {
if (id === GUEST_WC_ID) {
return guest
}
return null
})
const browserManager = new BrowserManager()
// Simulate the attach-time policy (normally done in will-attach-webview)
browserManager.attachGuestPolicies(guest as never)
browserManager.registerGuest({
browserPageId: 'page-1',
webContentsId: GUEST_WC_ID,
rendererWebContentsId: RENDERER_WC_ID
})
const cdpBridge = new CdpBridge(browserManager)
cdpBridge.setActiveTab(GUEST_WC_ID)
const userDataPath = mkdtempSync(join(tmpdir(), 'browser-e2e-'))
const runtime = new OrcaRuntimeService()
runtime.setCdpBridge(cdpBridge)
server = new OrcaRuntimeRpcServer({ runtime, userDataPath })
await server.start()
const metadata = readRuntimeMetadata(userDataPath)!
endpoint = metadata.transport!.endpoint
authToken = metadata.authToken!
})
afterEach(async () => {
await server.stop()
})
async function rpc(method: string, params?: Record<string, unknown>) {
const response = await sendRequest(endpoint, {
id: `req_${method}`,
authToken,
method,
...(params ? { params } : {})
})
return response
}
// ── Snapshot ──
it('takes a snapshot and returns refs for interactive elements', async () => {
const res = await rpc('browser.snapshot')
expect(res.ok).toBe(true)
const result = res.result as {
snapshot: string
refs: { ref: string; role: string; name: string }[]
url: string
title: string
}
expect(result.url).toBe('https://example.com')
expect(result.title).toBe('Example Domain')
expect(result.snapshot).toContain('heading "Example Domain"')
expect(result.snapshot).toContain('link "More information..."')
expect(result.refs).toHaveLength(1)
expect(result.refs[0]).toMatchObject({
ref: '@e1',
role: 'link',
name: 'More information...'
})
})
// ── Click ──
it('clicks an element by ref after snapshot', async () => {
await rpc('browser.snapshot')
const res = await rpc('browser.click', { element: '@e1' })
expect(res.ok).toBe(true)
expect((res.result as { clicked: string }).clicked).toBe('@e1')
})
it('returns error when clicking without a prior snapshot', async () => {
const res = await rpc('browser.click', { element: '@e1' })
expect(res.ok).toBe(false)
expect((res.error as { code: string }).code).toBe('browser_stale_ref')
})
it('returns error for non-existent ref', async () => {
await rpc('browser.snapshot')
const res = await rpc('browser.click', { element: '@e999' })
expect(res.ok).toBe(false)
expect((res.error as { code: string }).code).toBe('browser_ref_not_found')
})
// ── Navigation ──
it('navigates to a URL and invalidates refs', async () => {
await rpc('browser.snapshot')
const gotoRes = await rpc('browser.goto', { url: 'https://search.example.com' })
expect(gotoRes.ok).toBe(true)
const gotoResult = gotoRes.result as { url: string; title: string }
expect(gotoResult.url).toBe('https://search.example.com')
expect(gotoResult.title).toBe('Search')
// Old refs should be stale after navigation
const clickRes = await rpc('browser.click', { element: '@e1' })
expect(clickRes.ok).toBe(false)
expect((clickRes.error as { code: string }).code).toBe('browser_stale_ref')
// Re-snapshot should work and show new page
const snapRes = await rpc('browser.snapshot')
expect(snapRes.ok).toBe(true)
const snapResult = snapRes.result as { snapshot: string; refs: { name: string }[] }
expect(snapResult.snapshot).toContain('Search')
expect(snapResult.refs.map((r) => r.name)).toContain('Search')
expect(snapResult.refs.map((r) => r.name)).toContain('Home')
})
it('returns error for failed navigation', async () => {
const res = await rpc('browser.goto', { url: 'https://nonexistent.invalid' })
expect(res.ok).toBe(false)
expect((res.error as { code: string }).code).toBe('browser_navigation_failed')
})
// ── Fill ──
it('fills an input by ref', async () => {
await rpc('browser.goto', { url: 'https://search.example.com' })
await rpc('browser.snapshot')
// @e2 should be the textbox "Search query" on the search page
const res = await rpc('browser.fill', { element: '@e2', value: 'hello world' })
expect(res.ok).toBe(true)
expect((res.result as { filled: string }).filled).toBe('@e2')
})
// ── Type ──
it('types text at current focus', async () => {
const res = await rpc('browser.type', { input: 'some text' })
expect(res.ok).toBe(true)
expect((res.result as { typed: boolean }).typed).toBe(true)
})
// ── Select ──
it('selects a dropdown option by ref', async () => {
await rpc('browser.goto', { url: 'https://search.example.com' })
await rpc('browser.snapshot')
const res = await rpc('browser.select', { element: '@e2', value: 'option-1' })
expect(res.ok).toBe(true)
expect((res.result as { selected: string }).selected).toBe('@e2')
})
// ── Scroll ──
it('scrolls the viewport', async () => {
const res = await rpc('browser.scroll', { direction: 'down' })
expect(res.ok).toBe(true)
expect((res.result as { scrolled: string }).scrolled).toBe('down')
const res2 = await rpc('browser.scroll', { direction: 'up', amount: 200 })
expect(res2.ok).toBe(true)
expect((res2.result as { scrolled: string }).scrolled).toBe('up')
})
// ── Reload ──
it('reloads the page', async () => {
const res = await rpc('browser.reload')
expect(res.ok).toBe(true)
expect((res.result as { url: string }).url).toBe('https://example.com')
})
// ── Screenshot ──
it('captures a screenshot', async () => {
const res = await rpc('browser.screenshot', { format: 'png' })
expect(res.ok).toBe(true)
const result = res.result as { data: string; format: string }
expect(result.format).toBe('png')
expect(result.data.length).toBeGreaterThan(0)
})
// ── Eval ──
it('evaluates JavaScript in the page context', async () => {
const res = await rpc('browser.eval', { expression: '2 + 2' })
expect(res.ok).toBe(true)
expect((res.result as { value: string }).value).toBe('4')
})
// ── Tab management ──
it('lists open tabs', async () => {
const res = await rpc('browser.tabList')
expect(res.ok).toBe(true)
const result = res.result as { tabs: { index: number; url: string; active: boolean }[] }
expect(result.tabs).toHaveLength(1)
expect(result.tabs[0]).toMatchObject({
index: 0,
url: 'https://example.com',
active: true
})
})
it('returns error for out-of-range tab switch', async () => {
const res = await rpc('browser.tabSwitch', { index: 5 })
expect(res.ok).toBe(false)
expect((res.error as { code: string }).code).toBe('browser_tab_not_found')
})
// ── Full agent workflow simulation ──
it('simulates a complete agent workflow: navigate → snapshot → interact → re-snapshot', async () => {
// 1. Navigate to search page
const gotoRes = await rpc('browser.goto', { url: 'https://search.example.com' })
expect(gotoRes.ok).toBe(true)
// 2. Snapshot the page
const snap1 = await rpc('browser.snapshot')
expect(snap1.ok).toBe(true)
const snap1Result = snap1.result as {
snapshot: string
refs: { ref: string; role: string; name: string }[]
}
// Verify we see the search page structure
expect(snap1Result.snapshot).toContain('[Main Nav]')
expect(snap1Result.snapshot).toContain('text input "Search query"')
expect(snap1Result.snapshot).toContain('button "Search"')
// 3. Fill the search input
const searchInput = snap1Result.refs.find((r) => r.name === 'Search query')
expect(searchInput).toBeDefined()
const fillRes = await rpc('browser.fill', {
element: searchInput!.ref,
value: 'integration testing'
})
expect(fillRes.ok).toBe(true)
// 4. Click the search button
const searchBtn = snap1Result.refs.find((r) => r.name === 'Search')
expect(searchBtn).toBeDefined()
const clickRes = await rpc('browser.click', { element: searchBtn!.ref })
expect(clickRes.ok).toBe(true)
// 5. Take a screenshot
const ssRes = await rpc('browser.screenshot')
expect(ssRes.ok).toBe(true)
// 6. Check tab list
const tabRes = await rpc('browser.tabList')
expect(tabRes.ok).toBe(true)
const tabs = (tabRes.result as { tabs: { url: string }[] }).tabs
expect(tabs[0].url).toBe('https://search.example.com')
})
// ── No tab errors ──
it('returns browser_no_tab when no tabs are registered', async () => {
// Create a fresh setup with no registered tabs
const emptyManager = new BrowserManager()
const emptyBridge = new CdpBridge(emptyManager)
const userDataPath2 = mkdtempSync(join(tmpdir(), 'browser-e2e-empty-'))
const runtime2 = new OrcaRuntimeService()
runtime2.setCdpBridge(emptyBridge)
const server2 = new OrcaRuntimeRpcServer({ runtime: runtime2, userDataPath: userDataPath2 })
await server2.start()
const metadata2 = readRuntimeMetadata(userDataPath2)!
const res = await sendRequest(metadata2.transport!.endpoint, {
id: 'req_no_tab',
authToken: metadata2.authToken,
method: 'browser.snapshot'
})
expect(res.ok).toBe(false)
expect((res.error as { code: string }).code).toBe('browser_no_tab')
await server2.stop()
})
})

View file

@ -0,0 +1,638 @@
/* eslint-disable max-lines -- Why: the CDP bridge owns debugger lifecycle, ref map management, command serialization, and all browser interaction logic in one module so the browser automation boundary stays coherent. */
import { webContents } from 'electron'
import type {
BrowserClickResult,
BrowserEvalResult,
BrowserFillResult,
BrowserGotoResult,
BrowserScreenshotResult,
BrowserScrollResult,
BrowserSelectResult,
BrowserSnapshotResult,
BrowserTabInfo,
BrowserTabListResult,
BrowserTabSwitchResult,
BrowserTypeResult
} from '../../shared/runtime-types'
import { buildSnapshot, type CdpCommandSender, type SnapshotResult } from './snapshot-engine'
import type { BrowserManager } from './browser-manager'
export class BrowserError extends Error {
constructor(
readonly code: string,
message: string
) {
super(message)
}
}
type TabState = {
navigationId: string | null
snapshotResult: SnapshotResult | null
debuggerAttached: boolean
}
type QueuedCommand = {
execute: () => Promise<unknown>
resolve: (value: unknown) => void
reject: (reason: unknown) => void
}
export class CdpBridge {
private activeWebContentsId: number | null = null
private readonly tabState = new Map<string, TabState>()
private readonly commandQueues = new Map<string, QueuedCommand[]>()
private readonly processingQueues = new Set<string>()
private readonly browserManager: BrowserManager
constructor(browserManager: BrowserManager) {
this.browserManager = browserManager
}
setActiveTab(webContentsId: number): void {
this.activeWebContentsId = webContentsId
}
getActiveWebContentsId(): number | null {
return this.activeWebContentsId
}
async snapshot(): Promise<BrowserSnapshotResult> {
return this.enqueueCommand(async () => {
const guest = this.getActiveGuest()
const sender = this.makeCdpSender(guest)
await this.ensureDebuggerAttached(guest)
const result = await buildSnapshot(sender)
const tabId = this.resolveTabId(guest.id)
const state = this.getOrCreateTabState(tabId)
state.snapshotResult = result
const navId = await this.getNavigationId(sender)
state.navigationId = navId
return {
snapshot: result.snapshot,
refs: result.refs,
url: guest.getURL(),
title: guest.getTitle()
}
})
}
async click(element: string): Promise<BrowserClickResult> {
return this.enqueueCommand(async () => {
const guest = this.getActiveGuest()
const sender = this.makeCdpSender(guest)
await this.ensureDebuggerAttached(guest)
const node = await this.resolveRef(guest, sender, element)
await sender('DOM.scrollIntoViewIfNeeded', { backendNodeId: node.backendDOMNodeId })
const { model } = (await sender('DOM.getBoxModel', {
backendNodeId: node.backendDOMNodeId
})) as { model: { content: number[] } }
const [x1, y1, , , x3, y3] = model.content
const cx = (x1 + x3) / 2
const cy = (y1 + y3) / 2
await sender('Input.dispatchMouseEvent', {
type: 'mousePressed',
x: cx,
y: cy,
button: 'left',
clickCount: 1
})
await sender('Input.dispatchMouseEvent', {
type: 'mouseReleased',
x: cx,
y: cy,
button: 'left',
clickCount: 1
})
return { clicked: element }
})
}
async goto(url: string): Promise<BrowserGotoResult> {
return this.enqueueCommand(async () => {
const guest = this.getActiveGuest()
const sender = this.makeCdpSender(guest)
await this.ensureDebuggerAttached(guest)
const { errorText } = (await sender('Page.navigate', { url })) as {
errorText?: string
}
if (errorText) {
throw new BrowserError('browser_navigation_failed', `Navigation failed: ${errorText}`)
}
await this.waitForLoad(sender)
this.invalidateRefMap(guest.id)
return { url: guest.getURL(), title: guest.getTitle() }
})
}
async fill(element: string, value: string): Promise<BrowserFillResult> {
return this.enqueueCommand(async () => {
const guest = this.getActiveGuest()
const sender = this.makeCdpSender(guest)
await this.ensureDebuggerAttached(guest)
const node = await this.resolveRef(guest, sender, element)
await sender('DOM.focus', { backendNodeId: node.backendDOMNodeId })
// Why: select-all then delete clears any existing value before typing,
// matching the behavior of Playwright's fill() and agent-browser's fill.
await sender('Input.dispatchKeyEvent', {
type: 'keyDown',
key: 'a',
modifiers: process.platform === 'darwin' ? 4 : 2
})
await sender('Input.dispatchKeyEvent', {
type: 'keyUp',
key: 'a',
modifiers: process.platform === 'darwin' ? 4 : 2
})
await sender('Input.dispatchKeyEvent', { type: 'keyDown', key: 'Delete' })
await sender('Input.dispatchKeyEvent', { type: 'keyUp', key: 'Delete' })
await sender('Input.insertText', { text: value })
return { filled: element }
})
}
async type(input: string): Promise<BrowserTypeResult> {
return this.enqueueCommand(async () => {
const guest = this.getActiveGuest()
const sender = this.makeCdpSender(guest)
await this.ensureDebuggerAttached(guest)
await sender('Input.insertText', { text: input })
return { typed: true }
})
}
async select(element: string, value: string): Promise<BrowserSelectResult> {
return this.enqueueCommand(async () => {
const guest = this.getActiveGuest()
const sender = this.makeCdpSender(guest)
await this.ensureDebuggerAttached(guest)
const node = await this.resolveRef(guest, sender, element)
const { nodeId } = (await sender('DOM.requestNode', {
backendNodeId: node.backendDOMNodeId
})) as { nodeId: number }
const { object } = (await sender('DOM.resolveNode', { nodeId })) as {
object: { objectId: string }
}
await sender('Runtime.callFunctionOn', {
objectId: object.objectId,
functionDeclaration: `function(val) {
this.value = val;
this.dispatchEvent(new Event('input', { bubbles: true }));
this.dispatchEvent(new Event('change', { bubbles: true }));
}`,
arguments: [{ value }]
})
return { selected: element }
})
}
async scroll(direction: 'up' | 'down', amount?: number): Promise<BrowserScrollResult> {
return this.enqueueCommand(async () => {
const guest = this.getActiveGuest()
const sender = this.makeCdpSender(guest)
await this.ensureDebuggerAttached(guest)
const { result: viewportResult } = (await sender('Runtime.evaluate', {
expression: 'JSON.stringify({ w: window.innerWidth, h: window.innerHeight })',
returnByValue: true
})) as { result: { value: string } }
const viewport = JSON.parse(viewportResult.value) as { w: number; h: number }
const scrollAmount = amount ?? viewport.h
const deltaY = direction === 'down' ? scrollAmount : -scrollAmount
await sender('Input.dispatchMouseEvent', {
type: 'mouseWheel',
x: viewport.w / 2,
y: viewport.h / 2,
deltaX: 0,
deltaY
})
return { scrolled: direction }
})
}
async back(): Promise<{ url: string; title: string }> {
return this.enqueueCommand(async () => {
const guest = this.getActiveGuest()
const sender = this.makeCdpSender(guest)
await this.ensureDebuggerAttached(guest)
await sender('Page.navigateToHistoryEntry', {
entryId: await this.getPreviousHistoryEntryId(sender)
})
await this.waitForLoad(sender)
this.invalidateRefMap(guest.id)
return { url: guest.getURL(), title: guest.getTitle() }
})
}
async reload(): Promise<{ url: string; title: string }> {
return this.enqueueCommand(async () => {
const guest = this.getActiveGuest()
const sender = this.makeCdpSender(guest)
await this.ensureDebuggerAttached(guest)
await sender('Page.reload')
await this.waitForLoad(sender)
this.invalidateRefMap(guest.id)
return { url: guest.getURL(), title: guest.getTitle() }
})
}
async screenshot(format: 'png' | 'jpeg' = 'png'): Promise<BrowserScreenshotResult> {
return this.enqueueCommand(async () => {
const guest = this.getActiveGuest()
const sender = this.makeCdpSender(guest)
await this.ensureDebuggerAttached(guest)
const { data } = (await sender('Page.captureScreenshot', {
format
})) as { data: string }
return { data, format }
})
}
async evaluate(expression: string): Promise<BrowserEvalResult> {
return this.enqueueCommand(async () => {
const guest = this.getActiveGuest()
const sender = this.makeCdpSender(guest)
await this.ensureDebuggerAttached(guest)
const { result, exceptionDetails } = (await sender('Runtime.evaluate', {
expression,
returnByValue: true
})) as {
result: { value?: unknown; type: string; description?: string }
exceptionDetails?: { text: string; exception?: { description?: string } }
}
if (exceptionDetails) {
throw new BrowserError(
'browser_eval_error',
exceptionDetails.exception?.description ?? exceptionDetails.text
)
}
return {
value: result.value !== undefined ? String(result.value) : (result.description ?? '')
}
})
}
tabList(): BrowserTabListResult {
const tabs: BrowserTabInfo[] = []
let index = 0
for (const [_tabId, wcId] of this.getRegisteredTabs()) {
const guest = webContents.fromId(wcId)
if (!guest || guest.isDestroyed()) {
continue
}
tabs.push({
index,
url: guest.getURL(),
title: guest.getTitle(),
active: wcId === this.activeWebContentsId
})
index++
}
return { tabs }
}
async tabSwitch(index: number): Promise<BrowserTabSwitchResult> {
const entries = [...this.getRegisteredTabs()]
if (index < 0 || index >= entries.length) {
throw new BrowserError(
'browser_tab_not_found',
`Tab index ${index} is out of range. ${entries.length} tab(s) open.`
)
}
const [_tabId, wcId] = entries[index]
if (this.activeWebContentsId !== null) {
this.invalidateRefMap(this.activeWebContentsId)
}
this.activeWebContentsId = wcId
return { switched: index }
}
onTabClosed(webContentsId: number): void {
if (this.activeWebContentsId === webContentsId) {
this.activeWebContentsId = null
}
const tabId = this.resolveTabIdSafe(webContentsId)
if (tabId) {
this.tabState.delete(tabId)
this.commandQueues.delete(tabId)
}
}
onTabChanged(webContentsId: number): void {
this.activeWebContentsId = webContentsId
}
// ── Private helpers ──
private getActiveGuest(): Electron.WebContents {
if (this.activeWebContentsId !== null) {
const guest = webContents.fromId(this.activeWebContentsId)
if (guest && !guest.isDestroyed()) {
return guest
}
// Why: the stored webContentsId may be stale after a Chromium process swap
// (navigation to a different-origin page, crash recovery). Fall through to
// the auto-select logic rather than immediately failing, since the tab may
// still be alive under a new webContentsId.
this.activeWebContentsId = null
}
const tabs = [...this.getRegisteredTabs()]
if (tabs.length === 0) {
throw new BrowserError(
'browser_no_tab',
'No browser tab is open. Use the Orca UI to open a browser tab first.'
)
}
if (tabs.length === 1) {
this.activeWebContentsId = tabs[0][1]
} else {
throw new BrowserError(
'browser_no_tab',
"Multiple browser tabs are open. Run 'orca tab list' and 'orca tab switch --index <n>' to select one."
)
}
const guest = webContents.fromId(this.activeWebContentsId!)
if (!guest || guest.isDestroyed()) {
this.activeWebContentsId = null
throw new BrowserError(
'browser_debugger_detached',
"The active browser tab was closed. Run 'orca tab list' to find remaining tabs."
)
}
return guest
}
private getRegisteredTabs(): Map<string, number> {
// Why: BrowserManager's tab maps are private. We access the singleton's
// state via the public getGuestWebContentsId method by iterating known tabs.
// This method provides the tab enumeration the CDP bridge needs without
// modifying BrowserManager's encapsulation. In the future a public
// listTabs() method on BrowserManager would be cleaner.
return (this.browserManager as unknown as { webContentsIdByTabId: Map<string, number> })
.webContentsIdByTabId
}
private resolveTabId(webContentsId: number): string {
for (const [tabId, wcId] of this.getRegisteredTabs()) {
if (wcId === webContentsId) {
return tabId
}
}
throw new BrowserError('browser_debugger_detached', 'Tab is no longer registered.')
}
private resolveTabIdSafe(webContentsId: number): string | null {
for (const [tabId, wcId] of this.getRegisteredTabs()) {
if (wcId === webContentsId) {
return tabId
}
}
return null
}
private getOrCreateTabState(tabId: string): TabState {
let state = this.tabState.get(tabId)
if (!state) {
state = { navigationId: null, snapshotResult: null, debuggerAttached: false }
this.tabState.set(tabId, state)
}
return state
}
private async ensureDebuggerAttached(guest: Electron.WebContents): Promise<void> {
const tabId = this.resolveTabId(guest.id)
const state = this.getOrCreateTabState(tabId)
if (state.debuggerAttached) {
return
}
try {
guest.debugger.attach('1.3')
} catch {
throw new BrowserError(
'browser_cdp_error',
'Could not attach debugger. DevTools may already be open for this tab.'
)
}
await this.makeCdpSender(guest)('Page.enable')
await this.makeCdpSender(guest)('DOM.enable')
guest.debugger.on('detach', () => {
state.debuggerAttached = false
state.snapshotResult = null
})
guest.debugger.on('message', (_event: unknown, method: string) => {
if (method === 'Page.frameNavigated') {
state.snapshotResult = null
state.navigationId = null
}
})
state.debuggerAttached = true
}
private makeCdpSender(guest: Electron.WebContents): CdpCommandSender {
return (method: string, params?: Record<string, unknown>) => {
const command = guest.debugger.sendCommand(method, params) as Promise<unknown>
// Why: Electron's CDP sendCommand can hang indefinitely if the debugger
// session is stale (e.g. after a renderer process swap that wasn't detected).
// A 10s timeout prevents the RPC from blocking until the CLI's socket timeout.
return Promise.race([
command,
new Promise<never>((_, reject) =>
setTimeout(
() =>
reject(new BrowserError('browser_cdp_error', `CDP command "${method}" timed out`)),
10_000
)
)
])
}
}
private async resolveRef(
guest: Electron.WebContents,
sender: CdpCommandSender,
ref: string
): Promise<{ backendDOMNodeId: number; role: string; name: string }> {
const tabId = this.resolveTabId(guest.id)
const state = this.getOrCreateTabState(tabId)
if (!state.snapshotResult) {
throw new BrowserError(
'browser_stale_ref',
"No snapshot exists for this tab. Run 'orca snapshot' first."
)
}
const entry = state.snapshotResult.refMap.get(ref)
if (!entry) {
throw new BrowserError(
'browser_ref_not_found',
`Element ref ${ref} was not found. Run 'orca snapshot' to see available refs.`
)
}
const currentNavId = await this.getNavigationId(sender)
if (state.navigationId && currentNavId !== state.navigationId) {
state.snapshotResult = null
state.navigationId = null
throw new BrowserError(
'browser_stale_ref',
"The page has navigated since the last snapshot. Run 'orca snapshot' to get fresh refs."
)
}
try {
await sender('DOM.describeNode', { backendNodeId: entry.backendDOMNodeId })
} catch {
state.snapshotResult = null
throw new BrowserError(
'browser_stale_ref',
`Element ${ref} no longer exists in the DOM. Run 'orca snapshot' to get fresh refs.`
)
}
return entry
}
private async getNavigationId(sender: CdpCommandSender): Promise<string> {
const { entries, currentIndex } = (await sender('Page.getNavigationHistory')) as {
entries: { id: number; url: string }[]
currentIndex: number
}
const current = entries[currentIndex]
return current ? `${current.id}:${current.url}` : 'unknown'
}
private async getPreviousHistoryEntryId(sender: CdpCommandSender): Promise<number> {
const { entries, currentIndex } = (await sender('Page.getNavigationHistory')) as {
entries: { id: number }[]
currentIndex: number
}
if (currentIndex <= 0) {
throw new BrowserError('browser_navigation_failed', 'No previous history entry.')
}
return entries[currentIndex - 1].id
}
private async waitForLoad(sender: CdpCommandSender): Promise<void> {
await sender('Page.enable')
await new Promise<void>((resolve, reject) => {
const timeout = setTimeout(() => {
reject(new BrowserError('browser_timeout', 'Page load timed out after 30 seconds.'))
}, 30_000)
const check = async (): Promise<void> => {
try {
const { result } = (await sender('Runtime.evaluate', {
expression: 'document.readyState',
returnByValue: true
})) as { result: { value: string } }
if (result.value === 'complete') {
clearTimeout(timeout)
resolve()
} else {
setTimeout(check, 100)
}
} catch {
clearTimeout(timeout)
reject(new BrowserError('browser_cdp_error', 'Failed to check page load state.'))
}
}
check()
})
}
private invalidateRefMap(webContentsId: number): void {
const tabId = this.resolveTabIdSafe(webContentsId)
if (tabId) {
const state = this.tabState.get(tabId)
if (state) {
state.snapshotResult = null
state.navigationId = null
}
}
}
private async enqueueCommand<T>(execute: () => Promise<T>): Promise<T> {
const guest = this.getActiveGuest()
const tabId = this.resolveTabId(guest.id)
return new Promise<T>((resolve, reject) => {
let queue = this.commandQueues.get(tabId)
if (!queue) {
queue = []
this.commandQueues.set(tabId, queue)
}
queue.push({
execute: execute as () => Promise<unknown>,
resolve: resolve as (value: unknown) => void,
reject
})
this.processQueue(tabId)
})
}
private async processQueue(tabId: string): Promise<void> {
if (this.processingQueues.has(tabId)) {
return
}
this.processingQueues.add(tabId)
const queue = this.commandQueues.get(tabId)
while (queue && queue.length > 0) {
const cmd = queue.shift()!
try {
const result = await cmd.execute()
cmd.resolve(result)
} catch (error) {
cmd.reject(error)
}
}
this.processingQueues.delete(tabId)
}
}

View file

@ -0,0 +1,196 @@
import { describe, expect, it, vi } from 'vitest'
import { buildSnapshot, type CdpCommandSender } from './snapshot-engine'
type AXNode = {
nodeId: string
backendDOMNodeId?: number
role?: { type: string; value: string }
name?: { type: string; value: string }
properties?: { name: string; value: { type: string; value: unknown } }[]
childIds?: string[]
ignored?: boolean
}
function makeSender(nodes: AXNode[]): CdpCommandSender {
return vi.fn(async (method: string) => {
if (method === 'Accessibility.enable') {
return {}
}
if (method === 'Accessibility.getFullAXTree') {
return { nodes }
}
throw new Error(`Unexpected CDP method: ${method}`)
})
}
function node(
id: string,
role: string,
name: string,
opts?: {
childIds?: string[]
backendDOMNodeId?: number
ignored?: boolean
properties?: AXNode['properties']
}
): AXNode {
return {
nodeId: id,
backendDOMNodeId: opts?.backendDOMNodeId ?? parseInt(id, 10),
role: { type: 'role', value: role },
name: { type: 'computedString', value: name },
childIds: opts?.childIds,
ignored: opts?.ignored,
properties: opts?.properties
}
}
describe('buildSnapshot', () => {
it('returns empty snapshot for empty tree', async () => {
const result = await buildSnapshot(makeSender([]))
expect(result.snapshot).toBe('')
expect(result.refs).toEqual([])
expect(result.refMap.size).toBe(0)
})
it('assigns refs to interactive elements', async () => {
const nodes: AXNode[] = [
node('1', 'WebArea', 'page', { childIds: ['2', '3'] }),
node('2', 'button', 'Submit', { backendDOMNodeId: 10 }),
node('3', 'link', 'Home', { backendDOMNodeId: 11 })
]
const result = await buildSnapshot(makeSender(nodes))
expect(result.refs).toHaveLength(2)
expect(result.refs[0]).toEqual({ ref: '@e1', role: 'button', name: 'Submit' })
expect(result.refs[1]).toEqual({ ref: '@e2', role: 'link', name: 'Home' })
expect(result.snapshot).toContain('[@e1] button "Submit"')
expect(result.snapshot).toContain('[@e2] link "Home"')
})
it('renders text inputs with friendly role name', async () => {
const nodes: AXNode[] = [
node('1', 'WebArea', 'page', { childIds: ['2'] }),
node('2', 'textbox', 'Email', { backendDOMNodeId: 10 })
]
const result = await buildSnapshot(makeSender(nodes))
expect(result.snapshot).toContain('text input "Email"')
})
it('renders landmarks without refs', async () => {
const nodes: AXNode[] = [
node('1', 'WebArea', 'page', { childIds: ['2'] }),
node('2', 'navigation', 'Main Nav', { childIds: ['3'] }),
node('3', 'link', 'About', { backendDOMNodeId: 10 })
]
const result = await buildSnapshot(makeSender(nodes))
expect(result.snapshot).toContain('[Main Nav]')
expect(result.refs).toHaveLength(1)
expect(result.refs[0].name).toBe('About')
})
it('renders headings without refs', async () => {
const nodes: AXNode[] = [
node('1', 'WebArea', 'page', { childIds: ['2'] }),
node('2', 'heading', 'Welcome')
]
const result = await buildSnapshot(makeSender(nodes))
expect(result.snapshot).toContain('heading "Welcome"')
expect(result.refs).toHaveLength(0)
})
it('renders static text without refs', async () => {
const nodes: AXNode[] = [
node('1', 'WebArea', 'page', { childIds: ['2'] }),
node('2', 'staticText', 'Hello world')
]
const result = await buildSnapshot(makeSender(nodes))
expect(result.snapshot).toContain('text "Hello world"')
expect(result.refs).toHaveLength(0)
})
it('skips generic/none/presentation roles', async () => {
const nodes: AXNode[] = [
node('1', 'WebArea', 'page', { childIds: ['2'] }),
node('2', 'generic', '', { childIds: ['3'] }),
node('3', 'button', 'OK', { backendDOMNodeId: 10 })
]
const result = await buildSnapshot(makeSender(nodes))
expect(result.refs).toHaveLength(1)
expect(result.refs[0].name).toBe('OK')
expect(result.snapshot).not.toContain('generic')
})
it('skips ignored nodes but walks their children', async () => {
const nodes: AXNode[] = [
node('1', 'WebArea', 'page', { childIds: ['2'] }),
node('2', 'group', 'ignored group', { childIds: ['3'], ignored: true }),
node('3', 'button', 'Deep', { backendDOMNodeId: 10 })
]
const result = await buildSnapshot(makeSender(nodes))
expect(result.refs).toHaveLength(1)
expect(result.refs[0].name).toBe('Deep')
})
it('skips interactive elements without a name', async () => {
const nodes: AXNode[] = [
node('1', 'WebArea', 'page', { childIds: ['2', '3'] }),
node('2', 'button', '', { backendDOMNodeId: 10 }),
node('3', 'button', 'Labeled', { backendDOMNodeId: 11 })
]
const result = await buildSnapshot(makeSender(nodes))
expect(result.refs).toHaveLength(1)
expect(result.refs[0].name).toBe('Labeled')
})
it('populates refMap with backendDOMNodeId', async () => {
const nodes: AXNode[] = [
node('1', 'WebArea', 'page', { childIds: ['2'] }),
node('2', 'checkbox', 'Agree', { backendDOMNodeId: 42 })
]
const result = await buildSnapshot(makeSender(nodes))
const entry = result.refMap.get('@e1')
expect(entry).toBeDefined()
expect(entry!.backendDOMNodeId).toBe(42)
expect(entry!.role).toBe('checkbox')
expect(entry!.name).toBe('Agree')
})
it('indents children under landmarks', async () => {
const nodes: AXNode[] = [
node('1', 'WebArea', 'page', { childIds: ['2'] }),
node('2', 'main', '', { childIds: ['3'] }),
node('3', 'button', 'Action', { backendDOMNodeId: 10 })
]
const result = await buildSnapshot(makeSender(nodes))
const lines = result.snapshot.split('\n')
const mainLine = lines.find((l) => l.includes('[Main Content]'))
const buttonLine = lines.find((l) => l.includes('Action'))
expect(mainLine).toBeDefined()
expect(buttonLine).toBeDefined()
expect(buttonLine!.startsWith(' ')).toBe(true)
})
it('handles a realistic page structure', async () => {
const nodes: AXNode[] = [
node('1', 'WebArea', 'page', { childIds: ['2', '3', '4'] }),
node('2', 'banner', '', { childIds: ['5'] }),
node('3', 'main', '', { childIds: ['6', '7', '8'] }),
node('4', 'contentinfo', '', {}),
node('5', 'link', 'Logo', { backendDOMNodeId: 10 }),
node('6', 'heading', 'Dashboard'),
node('7', 'textbox', 'Search', { backendDOMNodeId: 20 }),
node('8', 'button', 'Go', { backendDOMNodeId: 21 })
]
const result = await buildSnapshot(makeSender(nodes))
expect(result.refs).toHaveLength(3)
expect(result.refs.map((r) => r.name)).toEqual(['Logo', 'Search', 'Go'])
expect(result.snapshot).toContain('[Header]')
expect(result.snapshot).toContain('[Main Content]')
expect(result.snapshot).toContain('[Footer]')
expect(result.snapshot).toContain('heading "Dashboard"')
})
})

View file

@ -0,0 +1,262 @@
import type { BrowserSnapshotRef } from '../../shared/runtime-types'
export type CdpCommandSender = (
method: string,
params?: Record<string, unknown>
) => Promise<unknown>
type AXNode = {
nodeId: string
backendDOMNodeId?: number
role?: { type: string; value: string }
name?: { type: string; value: string }
properties?: { name: string; value: { type: string; value: unknown } }[]
childIds?: string[]
ignored?: boolean
}
type SnapshotEntry = {
ref: string
role: string
name: string
backendDOMNodeId: number
depth: number
}
export type SnapshotResult = {
snapshot: string
refs: BrowserSnapshotRef[]
refMap: Map<string, { backendDOMNodeId: number; role: string; name: string }>
}
const INTERACTIVE_ROLES = new Set([
'button',
'link',
'textbox',
'searchbox',
'combobox',
'checkbox',
'radio',
'switch',
'slider',
'spinbutton',
'menuitem',
'menuitemcheckbox',
'menuitemradio',
'tab',
'option',
'treeitem'
])
const LANDMARK_ROLES = new Set([
'banner',
'navigation',
'main',
'complementary',
'contentinfo',
'region',
'form',
'search'
])
const HEADING_PATTERN = /^heading$/
const SKIP_ROLES = new Set(['none', 'presentation', 'generic'])
export async function buildSnapshot(sendCommand: CdpCommandSender): Promise<SnapshotResult> {
await sendCommand('Accessibility.enable')
const { nodes } = (await sendCommand('Accessibility.getFullAXTree')) as { nodes: AXNode[] }
const nodeById = new Map<string, AXNode>()
for (const node of nodes) {
nodeById.set(node.nodeId, node)
}
const entries: SnapshotEntry[] = []
let refCounter = 1
const root = nodes[0]
if (!root) {
return { snapshot: '', refs: [], refMap: new Map() }
}
walkTree(root, nodeById, 0, entries, () => refCounter++)
const refMap = new Map<string, { backendDOMNodeId: number; role: string; name: string }>()
const refs: BrowserSnapshotRef[] = []
const lines: string[] = []
for (const entry of entries) {
const indent = ' '.repeat(entry.depth)
if (entry.ref) {
lines.push(`${indent}[${entry.ref}] ${entry.role} "${entry.name}"`)
refs.push({ ref: entry.ref, role: entry.role, name: entry.name })
refMap.set(entry.ref, {
backendDOMNodeId: entry.backendDOMNodeId,
role: entry.role,
name: entry.name
})
} else {
lines.push(`${indent}${entry.role} "${entry.name}"`)
}
}
return { snapshot: lines.join('\n'), refs, refMap }
}
function walkTree(
node: AXNode,
nodeById: Map<string, AXNode>,
depth: number,
entries: SnapshotEntry[],
nextRef: () => number
): void {
if (node.ignored) {
walkChildren(node, nodeById, depth, entries, nextRef)
return
}
const role = node.role?.value ?? ''
const name = node.name?.value ?? ''
if (SKIP_ROLES.has(role)) {
walkChildren(node, nodeById, depth, entries, nextRef)
return
}
const isInteractive = INTERACTIVE_ROLES.has(role)
const isHeading = HEADING_PATTERN.test(role)
const isLandmark = LANDMARK_ROLES.has(role)
const isStaticText = role === 'staticText' || role === 'StaticText'
if (!isInteractive && !isHeading && !isLandmark && !isStaticText) {
walkChildren(node, nodeById, depth, entries, nextRef)
return
}
if (!name && !isLandmark) {
walkChildren(node, nodeById, depth, entries, nextRef)
return
}
const hasFocusable = isInteractive && isFocusable(node)
if (isLandmark) {
entries.push({
ref: '',
role: formatLandmarkRole(role, name),
name: name || role,
backendDOMNodeId: node.backendDOMNodeId ?? 0,
depth
})
walkChildren(node, nodeById, depth + 1, entries, nextRef)
return
}
if (isHeading) {
entries.push({
ref: '',
role: 'heading',
name,
backendDOMNodeId: node.backendDOMNodeId ?? 0,
depth
})
return
}
if (isStaticText && name.trim().length > 0) {
entries.push({
ref: '',
role: 'text',
name: name.trim(),
backendDOMNodeId: node.backendDOMNodeId ?? 0,
depth
})
return
}
if (isInteractive && (hasFocusable || node.backendDOMNodeId)) {
const ref = `@e${nextRef()}`
entries.push({
ref,
role: formatInteractiveRole(role),
name: name || '(unlabeled)',
backendDOMNodeId: node.backendDOMNodeId ?? 0,
depth
})
return
}
walkChildren(node, nodeById, depth, entries, nextRef)
}
function walkChildren(
node: AXNode,
nodeById: Map<string, AXNode>,
depth: number,
entries: SnapshotEntry[],
nextRef: () => number
): void {
if (!node.childIds) {
return
}
for (const childId of node.childIds) {
const child = nodeById.get(childId)
if (child) {
walkTree(child, nodeById, depth, entries, nextRef)
}
}
}
function isFocusable(node: AXNode): boolean {
if (!node.properties) {
return true
}
const focusable = node.properties.find((p) => p.name === 'focusable')
if (focusable && focusable.value.value === false) {
return false
}
return true
}
function formatInteractiveRole(role: string): string {
switch (role) {
case 'textbox':
case 'searchbox':
return 'text input'
case 'combobox':
return 'combobox'
case 'menuitem':
case 'menuitemcheckbox':
case 'menuitemradio':
return 'menu item'
case 'spinbutton':
return 'number input'
case 'treeitem':
return 'tree item'
default:
return role
}
}
function formatLandmarkRole(role: string, name: string): string {
if (name) {
return `[${name}]`
}
switch (role) {
case 'banner':
return '[Header]'
case 'navigation':
return '[Navigation]'
case 'main':
return '[Main Content]'
case 'complementary':
return '[Sidebar]'
case 'contentinfo':
return '[Footer]'
case 'search':
return '[Search]'
default:
return `[${role}]`
}
}

View file

@ -35,6 +35,8 @@ import { CodexAccountService } from './codex-accounts/service'
import { CodexRuntimeHomeService } from './codex-accounts/runtime-home-service'
import { openCodeHookService } from './opencode/hook-service'
import { StarNagService } from './star-nag/service'
import { CdpBridge } from './browser/cdp-bridge'
import { browserManager } from './browser/browser-manager'
let mainWindow: BrowserWindow | null = null
/** Whether a manual app.quit() (Cmd+Q, etc.) is in progress. Shared with the
@ -158,6 +160,7 @@ app.whenReady().then(async () => {
starNag = new StarNagService(store, stats)
starNag.start()
starNag.registerIpcHandlers()
runtime.setCdpBridge(new CdpBridge(browserManager))
nativeTheme.themeSource = store.getSettings().theme ?? 'system'
registerAppMenu({
onCheckForUpdates: () => checkForUpdatesFromMenu(),

View file

@ -2,6 +2,7 @@
trust boundary (isTrustedBrowserRenderer) and handler teardown stay consistent. */
import { BrowserWindow, dialog, ipcMain } from 'electron'
import { browserManager } from '../browser/browser-manager'
import type { CdpBridge } from '../browser/cdp-bridge'
import { browserSessionRegistry } from '../browser/browser-session-registry'
import {
pickCookieFile,
@ -28,11 +29,16 @@ import type {
} from '../../shared/types'
let trustedBrowserRendererWebContentsId: number | null = null
let cdpBridgeRef: CdpBridge | null = null
export function setTrustedBrowserRendererWebContentsId(webContentsId: number | null): void {
trustedBrowserRendererWebContentsId = webContentsId
}
export function setCdpBridgeRef(bridge: CdpBridge | null): void {
cdpBridgeRef = bridge
}
function isTrustedBrowserRenderer(sender: Electron.WebContents): boolean {
if (sender.isDestroyed() || sender.getType() !== 'window') {
return false
@ -64,6 +70,7 @@ export function registerBrowserHandlers(): void {
ipcMain.removeHandler('browser:cancelGrab')
ipcMain.removeHandler('browser:captureSelectionScreenshot')
ipcMain.removeHandler('browser:extractHoverPayload')
ipcMain.removeHandler('browser:activeTabChanged')
ipcMain.handle(
'browser:registerGuest',
@ -71,10 +78,21 @@ export function registerBrowserHandlers(): void {
if (!isTrustedBrowserRenderer(event.sender)) {
return false
}
// Why: when Chromium swaps a guest's renderer process (navigation,
// crash recovery), the renderer re-registers the same browserPageId
// with a new webContentsId. If the CDP bridge was tracking the old
// webContentsId as active, update it to the new one so agent commands
// don't target a destroyed surface.
const previousWcId = browserManager.getGuestWebContentsId(args.browserPageId)
browserManager.registerGuest({
...args,
rendererWebContentsId: event.sender.id
})
if (cdpBridgeRef && previousWcId !== null && previousWcId !== args.webContentsId) {
if (cdpBridgeRef.getActiveWebContentsId() === previousWcId) {
cdpBridgeRef.onTabChanged(args.webContentsId)
}
}
return true
}
)
@ -83,10 +101,34 @@ export function registerBrowserHandlers(): void {
if (!isTrustedBrowserRenderer(event.sender)) {
return false
}
// Why: notify CDP bridge before unregistering so it can clean up debugger
// state and ref maps for the closing tab. Must happen before unregisterGuest
// clears the webContentsId mapping.
const wcId = browserManager.getGuestWebContentsId(args.browserPageId)
if (wcId !== null && cdpBridgeRef) {
cdpBridgeRef.onTabClosed(wcId)
}
browserManager.unregisterGuest(args.browserPageId)
return true
})
// Why: keeps the CDP bridge's active tab in sync with the renderer's UI state.
// Without this, a user switching tabs in the UI would leave the agent operating
// on the previous tab, which is confusing.
ipcMain.handle('browser:activeTabChanged', (event, args: { browserPageId: string }) => {
if (!isTrustedBrowserRenderer(event.sender)) {
return false
}
if (!cdpBridgeRef) {
return false
}
const wcId = browserManager.getGuestWebContentsId(args.browserPageId)
if (wcId !== null) {
cdpBridgeRef.onTabChanged(wcId)
}
return true
})
ipcMain.handle('browser:openDevTools', (event, args: { browserPageId: string }) => {
if (!isTrustedBrowserRenderer(event.sender)) {
return false

View file

@ -20,6 +20,7 @@ const {
registerUpdaterHandlersMock,
registerRateLimitHandlersMock,
registerBrowserHandlersMock,
setCdpBridgeRefMock,
setTrustedBrowserRendererWebContentsIdMock,
registerFilesystemWatcherHandlersMock,
registerAppHandlersMock
@ -43,6 +44,7 @@ const {
registerUpdaterHandlersMock: vi.fn(),
registerRateLimitHandlersMock: vi.fn(),
registerBrowserHandlersMock: vi.fn(),
setCdpBridgeRefMock: vi.fn(),
setTrustedBrowserRendererWebContentsIdMock: vi.fn(),
registerFilesystemWatcherHandlersMock: vi.fn(),
registerAppHandlersMock: vi.fn()
@ -123,7 +125,8 @@ vi.mock('../window/attach-main-window-services', () => ({
vi.mock('./browser', () => ({
registerBrowserHandlers: registerBrowserHandlersMock,
setTrustedBrowserRendererWebContentsId: setTrustedBrowserRendererWebContentsIdMock
setTrustedBrowserRendererWebContentsId: setTrustedBrowserRendererWebContentsIdMock,
setCdpBridgeRef: setCdpBridgeRefMock
}))
vi.mock('./app', () => ({
@ -153,6 +156,7 @@ describe('registerCoreHandlers', () => {
registerUpdaterHandlersMock.mockReset()
registerRateLimitHandlersMock.mockReset()
registerBrowserHandlersMock.mockReset()
setCdpBridgeRefMock.mockReset()
setTrustedBrowserRendererWebContentsIdMock.mockReset()
registerFilesystemWatcherHandlersMock.mockReset()
registerAppHandlersMock.mockReset()
@ -160,7 +164,7 @@ describe('registerCoreHandlers', () => {
it('passes the store through to handler registrars that need it', () => {
const store = { marker: 'store' }
const runtime = { marker: 'runtime' }
const runtime = { marker: 'runtime', getCdpBridge: () => null }
const stats = { marker: 'stats' }
const claudeUsage = { marker: 'claudeUsage' }
const codexUsage = { marker: 'codexUsage' }
@ -204,7 +208,7 @@ describe('registerCoreHandlers', () => {
// The first test already called registerCoreHandlers, so the module-level
// guard is now set. beforeEach reset all mocks, so call counts are 0.
const store2 = { marker: 'store2' }
const runtime2 = { marker: 'runtime2' }
const runtime2 = { marker: 'runtime2', getCdpBridge: () => null }
const stats2 = { marker: 'stats2' }
const claudeUsage2 = { marker: 'claudeUsage2' }
const codexUsage2 = { marker: 'codexUsage2' }

View file

@ -14,7 +14,7 @@ import { registerStatsHandlers } from './stats'
import { registerRateLimitHandlers } from './rate-limits'
import { registerRuntimeHandlers } from './runtime'
import { registerNotificationHandlers } from './notifications'
import { setTrustedBrowserRendererWebContentsId } from './browser'
import { setTrustedBrowserRendererWebContentsId, setCdpBridgeRef } from './browser'
import { registerSessionHandlers } from './session'
import { registerSettingsHandlers } from './settings'
import { registerBrowserHandlers } from './browser'
@ -49,6 +49,7 @@ export function registerCoreHandlers(
// if a channel is registered twice, so we guard to register only once and
// just update the per-window web-contents ID on subsequent calls.
setTrustedBrowserRendererWebContentsId(mainWindowWebContentsId)
setCdpBridgeRef(runtime.getCdpBridge())
if (registered) {
return
}

View file

@ -23,8 +23,22 @@ import type {
RuntimeSyncedLeaf,
RuntimeSyncedTab,
RuntimeSyncWindowGraph,
RuntimeWorktreeListResult
RuntimeWorktreeListResult,
BrowserSnapshotResult,
BrowserClickResult,
BrowserGotoResult,
BrowserFillResult,
BrowserTypeResult,
BrowserSelectResult,
BrowserScrollResult,
BrowserBackResult,
BrowserReloadResult,
BrowserScreenshotResult,
BrowserEvalResult,
BrowserTabListResult,
BrowserTabSwitchResult
} from '../../shared/runtime-types'
import type { CdpBridge } from '../browser/cdp-bridge'
import { getPRForBranch } from '../github/client'
import {
getGitUsername,
@ -149,6 +163,7 @@ export class OrcaRuntimeService {
private waitersByHandle = new Map<string, Set<TerminalWaiter>>()
private ptyController: RuntimePtyController | null = null
private notifier: RuntimeNotifier | null = null
private cdpBridge: CdpBridge | null = null
private resolvedWorktreeCache: ResolvedWorktreeCache | null = null
private agentDetector: AgentDetector | null = null
@ -189,6 +204,14 @@ export class OrcaRuntimeService {
this.notifier = notifier
}
setCdpBridge(bridge: CdpBridge | null): void {
this.cdpBridge = bridge
}
getCdpBridge(): CdpBridge | null {
return this.cdpBridge
}
attachWindow(windowId: number): void {
if (this.authoritativeWindowId === null) {
this.authoritativeWindowId = windowId
@ -1109,6 +1132,70 @@ export class OrcaRuntimeService {
private getLeafKey(tabId: string, leafId: string): string {
return `${tabId}::${leafId}`
}
// ── Browser automation ──
private requireCdpBridge(): CdpBridge {
if (!this.cdpBridge) {
throw new Error('runtime_unavailable')
}
return this.cdpBridge
}
async browserSnapshot(): Promise<BrowserSnapshotResult> {
return this.requireCdpBridge().snapshot()
}
async browserClick(params: { element: string }): Promise<BrowserClickResult> {
return this.requireCdpBridge().click(params.element)
}
async browserGoto(params: { url: string }): Promise<BrowserGotoResult> {
return this.requireCdpBridge().goto(params.url)
}
async browserFill(params: { element: string; value: string }): Promise<BrowserFillResult> {
return this.requireCdpBridge().fill(params.element, params.value)
}
async browserType(params: { input: string }): Promise<BrowserTypeResult> {
return this.requireCdpBridge().type(params.input)
}
async browserSelect(params: { element: string; value: string }): Promise<BrowserSelectResult> {
return this.requireCdpBridge().select(params.element, params.value)
}
async browserScroll(params: {
direction: 'up' | 'down'
amount?: number
}): Promise<BrowserScrollResult> {
return this.requireCdpBridge().scroll(params.direction, params.amount)
}
async browserBack(): Promise<BrowserBackResult> {
return this.requireCdpBridge().back()
}
async browserReload(): Promise<BrowserReloadResult> {
return this.requireCdpBridge().reload()
}
async browserScreenshot(params: { format?: 'png' | 'jpeg' }): Promise<BrowserScreenshotResult> {
return this.requireCdpBridge().screenshot(params.format)
}
async browserEval(params: { expression: string }): Promise<BrowserEvalResult> {
return this.requireCdpBridge().evaluate(params.expression)
}
browserTabList(): BrowserTabListResult {
return this.requireCdpBridge().tabList()
}
async browserTabSwitch(params: { index: number }): Promise<BrowserTabSwitchResult> {
return this.requireCdpBridge().tabSwitch(params.index)
}
}
const MAX_TAIL_LINES = 120

View file

@ -701,6 +701,189 @@ export class OrcaRuntimeRpcServer {
}
}
// ── Browser automation routes ──
if (request.method === 'browser.snapshot') {
try {
const result = await this.runtime.browserSnapshot()
return this.successResponse(request.id, result)
} catch (error) {
return this.browserErrorResponse(request.id, error)
}
}
if (request.method === 'browser.click') {
try {
const params = this.extractParams(request)
const element = typeof params?.element === 'string' ? params.element : null
if (!element) {
return this.errorResponse(request.id, 'invalid_argument', 'Missing required --element')
}
const result = await this.runtime.browserClick({ element })
return this.successResponse(request.id, result)
} catch (error) {
return this.browserErrorResponse(request.id, error)
}
}
if (request.method === 'browser.goto') {
try {
const params = this.extractParams(request)
const url = typeof params?.url === 'string' ? params.url : null
if (!url) {
return this.errorResponse(request.id, 'invalid_argument', 'Missing required --url')
}
const result = await this.runtime.browserGoto({ url })
return this.successResponse(request.id, result)
} catch (error) {
return this.browserErrorResponse(request.id, error)
}
}
if (request.method === 'browser.fill') {
try {
const params = this.extractParams(request)
const element = typeof params?.element === 'string' ? params.element : null
const value = typeof params?.value === 'string' ? params.value : null
if (!element) {
return this.errorResponse(request.id, 'invalid_argument', 'Missing required --element')
}
if (value === null) {
return this.errorResponse(request.id, 'invalid_argument', 'Missing required --value')
}
const result = await this.runtime.browserFill({ element, value })
return this.successResponse(request.id, result)
} catch (error) {
return this.browserErrorResponse(request.id, error)
}
}
if (request.method === 'browser.type') {
try {
const params = this.extractParams(request)
const input = typeof params?.input === 'string' ? params.input : null
if (!input) {
return this.errorResponse(request.id, 'invalid_argument', 'Missing required --input')
}
const result = await this.runtime.browserType({ input })
return this.successResponse(request.id, result)
} catch (error) {
return this.browserErrorResponse(request.id, error)
}
}
if (request.method === 'browser.select') {
try {
const params = this.extractParams(request)
const element = typeof params?.element === 'string' ? params.element : null
const value = typeof params?.value === 'string' ? params.value : null
if (!element) {
return this.errorResponse(request.id, 'invalid_argument', 'Missing required --element')
}
if (value === null) {
return this.errorResponse(request.id, 'invalid_argument', 'Missing required --value')
}
const result = await this.runtime.browserSelect({ element, value })
return this.successResponse(request.id, result)
} catch (error) {
return this.browserErrorResponse(request.id, error)
}
}
if (request.method === 'browser.scroll') {
try {
const params = this.extractParams(request)
const direction = typeof params?.direction === 'string' ? params.direction : null
if (direction !== 'up' && direction !== 'down') {
return this.errorResponse(
request.id,
'invalid_argument',
'Missing required --direction (up or down)'
)
}
const amount =
typeof params?.amount === 'number' && params.amount > 0 ? params.amount : undefined
const result = await this.runtime.browserScroll({ direction, amount })
return this.successResponse(request.id, result)
} catch (error) {
return this.browserErrorResponse(request.id, error)
}
}
if (request.method === 'browser.back') {
try {
const result = await this.runtime.browserBack()
return this.successResponse(request.id, result)
} catch (error) {
return this.browserErrorResponse(request.id, error)
}
}
if (request.method === 'browser.reload') {
try {
const result = await this.runtime.browserReload()
return this.successResponse(request.id, result)
} catch (error) {
return this.browserErrorResponse(request.id, error)
}
}
if (request.method === 'browser.screenshot') {
try {
const params = this.extractParams(request)
const format =
typeof params?.format === 'string' &&
(params.format === 'png' || params.format === 'jpeg')
? params.format
: undefined
const result = await this.runtime.browserScreenshot({ format })
return this.successResponse(request.id, result)
} catch (error) {
return this.browserErrorResponse(request.id, error)
}
}
if (request.method === 'browser.eval') {
try {
const params = this.extractParams(request)
const expression = typeof params?.expression === 'string' ? params.expression : null
if (!expression) {
return this.errorResponse(request.id, 'invalid_argument', 'Missing required --expression')
}
const result = await this.runtime.browserEval({ expression })
return this.successResponse(request.id, result)
} catch (error) {
return this.browserErrorResponse(request.id, error)
}
}
if (request.method === 'browser.tabList') {
try {
const result = this.runtime.browserTabList()
return this.successResponse(request.id, result)
} catch (error) {
return this.browserErrorResponse(request.id, error)
}
}
if (request.method === 'browser.tabSwitch') {
try {
const params = this.extractParams(request)
const index = typeof params?.index === 'number' ? params.index : null
if (index === null || !Number.isInteger(index) || index < 0) {
return this.errorResponse(
request.id,
'invalid_argument',
'Missing required --index (non-negative integer)'
)
}
const result = await this.runtime.browserTabSwitch({ index })
return this.successResponse(request.id, result)
} catch (error) {
return this.browserErrorResponse(request.id, error)
}
}
return this.errorResponse(request.id, 'method_not_found', `Unknown method: ${request.method}`)
}
@ -718,6 +901,38 @@ export class OrcaRuntimeRpcServer {
}
}
private successResponse(id: string, result: unknown): RuntimeRpcResponse {
return {
id,
ok: true,
result,
_meta: {
runtimeId: this.runtime.getRuntimeId()
}
}
}
private extractParams(request: { params?: unknown }): Record<string, unknown> | null {
return request.params && typeof request.params === 'object' && request.params !== null
? (request.params as Record<string, unknown>)
: null
}
// Why: browser errors carry a structured .code property (BrowserError from
// cdp-bridge.ts) that maps directly to agent-facing error codes. We forward
// that code rather than relying on the message-matching pattern used by
// runtimeErrorResponse, which would require adding 10+ entries to the allowlist.
private browserErrorResponse(id: string, error: unknown): RuntimeRpcResponse {
if (
error instanceof Error &&
'code' in error &&
typeof (error as { code: unknown }).code === 'string'
) {
return this.errorResponse(id, (error as { code: string }).code, error.message)
}
return this.runtimeErrorResponse(id, error)
}
private runtimeErrorResponse(id: string, error: unknown): RuntimeRpcResponse {
const message = error instanceof Error ? error.message : String(error)
if (

View file

@ -140,6 +140,7 @@ export type BrowserApi = {
browserProfile?: string
}) => Promise<BrowserCookieImportResult>
sessionClearDefaultCookies: () => Promise<boolean>
notifyActiveTabChanged: (args: { browserPageId: string }) => Promise<boolean>
}
export type DetectedBrowserProfileInfo = {

View file

@ -748,7 +748,10 @@ const api = {
> => ipcRenderer.invoke('browser:session:importFromBrowser', args),
sessionClearDefaultCookies: (): Promise<boolean> =>
ipcRenderer.invoke('browser:session:clearDefaultCookies')
ipcRenderer.invoke('browser:session:clearDefaultCookies'),
notifyActiveTabChanged: (args: { browserPageId: string }): Promise<boolean> =>
ipcRenderer.invoke('browser:activeTabChanged', args)
},
hooks: {

View file

@ -595,6 +595,17 @@ export const createBrowserSlice: StateCreator<AppState, [], [], BrowserSlice> =
}
})
// Why: notify the CDP bridge which guest webContents is now active so
// subsequent agent commands (snapshot, click, etc.) target the correct tab.
// registerGuest uses page IDs (not workspace IDs), so we resolve the active
// page within the workspace to find the correct browserPageId.
const workspace = findWorkspace(get().browserTabsByWorktree, tabId)
if (workspace?.activePageId && typeof window !== 'undefined' && window.api?.browser) {
window.api.browser
.notifyActiveTabChanged({ browserPageId: workspace.activePageId })
.catch(() => {})
}
const item = Object.values(get().unifiedTabsByWorktree)
.flat()
.find((entry) => entry.contentType === 'browser' && entry.entityId === tabId)
@ -796,6 +807,12 @@ export const createBrowserSlice: StateCreator<AppState, [], [], BrowserSlice> =
}
})
// Why: switching the active page within a workspace changes which guest
// webContents the CDP bridge should target for agent commands.
if (typeof window !== 'undefined' && window.api?.browser) {
window.api.browser.notifyActiveTabChanged({ browserPageId: pageId }).catch(() => {})
}
const workspace = findWorkspace(get().browserTabsByWorktree, workspaceId)
if (!workspace) {
return

View file

@ -152,3 +152,89 @@ export type RuntimeWorktreeListResult = {
totalCount: number
truncated: boolean
}
// ── Browser automation types ──
export type BrowserSnapshotRef = {
ref: string
role: string
name: string
}
export type BrowserSnapshotResult = {
snapshot: string
refs: BrowserSnapshotRef[]
url: string
title: string
}
export type BrowserClickResult = {
clicked: string
}
export type BrowserGotoResult = {
url: string
title: string
}
export type BrowserFillResult = {
filled: string
}
export type BrowserTypeResult = {
typed: boolean
}
export type BrowserSelectResult = {
selected: string
}
export type BrowserScrollResult = {
scrolled: 'up' | 'down'
}
export type BrowserBackResult = {
url: string
title: string
}
export type BrowserReloadResult = {
url: string
title: string
}
export type BrowserScreenshotResult = {
data: string
format: 'png' | 'jpeg'
}
export type BrowserEvalResult = {
value: string
}
export type BrowserTabInfo = {
index: number
url: string
title: string
active: boolean
}
export type BrowserTabListResult = {
tabs: BrowserTabInfo[]
}
export type BrowserTabSwitchResult = {
switched: number
}
export type BrowserErrorCode =
| 'browser_no_tab'
| 'browser_tab_not_found'
| 'browser_stale_ref'
| 'browser_ref_not_found'
| 'browser_navigation_failed'
| 'browser_element_not_interactable'
| 'browser_eval_error'
| 'browser_cdp_error'
| 'browser_debugger_detached'
| 'browser_timeout'