n8n/packages/@n8n/computer-use/spec/technical-spec.md

427 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Local Gateway — Backend Technical Specification
> Feature behaviour is defined in [local-gateway.md](./local-gateway.md).
> This document covers the backend implementation in
> `packages/cli/src/modules/instance-ai`.
---
## Table of Contents
1. [Component Overview](#1-component-overview)
2. [Authentication Model](#2-authentication-model)
3. [HTTP API](#3-http-api)
4. [Gateway Lifecycle](#4-gateway-lifecycle)
5. [Per-User Isolation](#5-per-user-isolation)
6. [Tool Call Dispatch](#6-tool-call-dispatch)
7. [Disconnect & Reconnect](#7-disconnect--reconnect)
8. [Module Settings](#8-module-settings)
---
## 1. Component Overview
The local gateway involves three runtime processes:
- **n8n server** — hosts the REST/SSE endpoints and orchestrates the AI agent.
- **computer-use daemon or local-gateway app** — runs on the user's local machine; executes tool calls.
- **Browser (frontend)** — initiates the connection and displays gateway status.
```mermaid
graph LR
FE[Browser / Frontend]
SRV[n8n Server]
DAEMON[computer-use Daemon\nlocal machine]
FE -- "POST /gateway/create-link\n(user auth)" --> SRV
FE -- "GET /gateway/status\n(user auth)" --> SRV
SRV -- "SSE push: instanceAiGatewayStateChanged\n(per-user)" --> FE
DAEMON -- "POST /gateway/init ➊\n(x-gateway-key, on connect & reconnect)" --> SRV
DAEMON <-- "GET /gateway/events?apiKey=... ➋\n(persistent SSE, tool call requests)" --> SRV
DAEMON -- "POST /gateway/response/:id\n(x-gateway-key, per tool call)" --> SRV
DAEMON -- "POST /gateway/disconnect\n(x-gateway-key, on shutdown)" --> SRV
```
> **➊ → ➋ ordering**: the daemon always calls `POST /gateway/init` before opening the SSE
> stream. The numbers indicate startup sequence, not request direction.
### Key classes
| Class | File | Responsibility |
|---|---|---|
| `LocalGatewayRegistry` | `filesystem/local-gateway-registry.ts` | Per-user state: tokens, session keys, timers, gateway instances |
| `LocalGateway` | `filesystem/local-gateway.ts` | Single-user MCP gateway: tool call dispatch, pending request tracking |
| `InstanceAiService` | `instance-ai.service.ts` | Thin delegation layer; exposes registry methods to the controller |
| `InstanceAiController` | `instance-ai.controller.ts` | HTTP endpoints; routes daemon requests to the correct user's gateway |
---
## 2. Authentication Model
The gateway uses two distinct authentication schemes for the two sides of the
connection.
### User-facing endpoints
Standard n8n session or API-key auth (`@Authenticated` / `@GlobalScope`).
The `userId` is taken from `req.user.id`.
### Daemon-facing endpoints (`skipAuth: true`)
These endpoints are not protected by the standard auth middleware. Instead,
they verify a **gateway API key** passed in one of two ways:
- `GET /gateway/events``?apiKey=<key>` query parameter (required for
`EventSource`, which cannot set headers).
- All other daemon endpoints — `x-gateway-key` request header.
The key is resolved to a `userId` by `validateGatewayApiKey()` in the
controller:
```
1. If N8N_INSTANCE_AI_GATEWAY_API_KEY env var is set and matches → userId = 'env-gateway'
2. Otherwise look up the key in LocalGatewayRegistry.getUserIdForApiKey()
- Matches pairing tokens (TTL: 5 min, one-time use)
- Matches active session keys (persistent until explicit disconnect)
3. No match → ForbiddenError
```
Timing-safe comparison (`crypto.timingSafeEqual`) is used for the env-var
path to prevent timing attacks.
---
## 3. HTTP API
All paths are prefixed with `/api/v1/instance-ai`.
### User-facing
| Method | Path | Auth | Description |
|---|---|---|---|
| `POST` | `/gateway/create-link` | User | Generate a pairing token; returns `{ token, command }` |
| `GET` | `/gateway/status` | User | Returns `{ connected, connectedAt, directory }` for the requesting user |
### Daemon-facing (`skipAuth`)
| Method | Path | Auth | Description |
|---|---|---|---|
| `GET` | `/gateway/events` | API key (`?apiKey`) | SSE stream; emits tool call requests to the daemon |
| `POST` | `/gateway/init` | API key (`x-gateway-key`) | Daemon announces capabilities; swaps pairing token for session key |
| `POST` | `/gateway/response/:requestId` | API key (`x-gateway-key`) | Daemon delivers a tool call result or error |
| `POST` | `/gateway/disconnect` | API key (`x-gateway-key`) | Daemon gracefully terminates the connection |
#### POST `/gateway/create-link` — response
```typescript
{
token: string; // gw_<nanoid(32)> — pairing token for /gateway/init
command: string; // "npx @n8n/computer-use <baseUrl> <token>"
}
```
#### GET `/gateway/status` — response
```typescript
{
connected: boolean;
connectedAt: string | null; // ISO timestamp
directory: string | null; // rootPath advertised by daemon
}
```
#### POST `/gateway/init` — request body
```typescript
// InstanceAiGatewayCapabilities
{
rootPath: string; // Filesystem root the daemon exposes
tools: McpTool[]; // MCP tool definitions the daemon supports
}
```
Response: `{ ok: true, sessionKey: string }` on first connect.
Response: `{ ok: true }` when reconnecting with an active session key.
#### POST `/gateway/response/:requestId` — request body
```typescript
{
result?: {
content: Array<
| { type: 'text'; text: string }
| { type: 'image'; data: string; mimeType: string }
>;
isError?: boolean;
};
error?: string;
}
```
---
## 4. Gateway Lifecycle
### 4.1 Initial connection
```mermaid
sequenceDiagram
participant FE as Browser
participant SRV as n8n Server
participant D as computer-use Daemon
FE->>SRV: POST /gateway/create-link (user auth)
SRV-->>FE: { token: "gw_...", command: "npx @n8n/computer-use ..." }
Note over FE: User runs the command on their machine
D->>SRV: POST /gateway/init (x-gateway-key: gw_...)
Note over D: uploadCapabilities() — resolves tool definitions,<br/>then POSTs rootPath + McpTool[]
Note over SRV: consumePairingToken(userId, token)<br/>Issues session key sess_...
SRV-->>D: { ok: true, sessionKey: "sess_..." }
Note over D: Stores session key, uses it for all<br/>subsequent requests instead of the pairing token
D->>SRV: GET /gateway/events?apiKey=sess_... (SSE, persistent)
Note over SRV: SSE connection held open,<br/>tool call requests streamed as events
SRV-->>FE: push: instanceAiGatewayStateChanged { connected: true, directory }
```
### 4.2 Reconnection with existing session key
After the initial handshake the daemon persists the session key in memory.
On reconnect (e.g. after a transient network drop):
```mermaid
sequenceDiagram
participant D as computer-use Daemon
participant SRV as n8n Server
D->>SRV: POST /gateway/init (x-gateway-key: sess_...)
Note over SRV: Session key found → userId<br/>initGateway(userId, capabilities), no token consumed
SRV-->>D: { ok: true }
D->>SRV: GET /gateway/events?apiKey=sess_... (SSE, persistent)
Note over SRV: SSE connection re-established
```
`generatePairingToken()` also short-circuits: if an active session key
already exists for the user it is returned directly, so a new pairing token
is never issued while a session is live.
### 4.3 Token & key lifecycle
```
generatePairingToken(userId)
│ Existing session key? ──yes──▶ return session key
│ Valid pairing token? ──yes──▶ return existing token
│ Otherwise ──────▶ create gw_<nanoid>, register in reverse lookup
consumePairingToken(userId, token)
│ Validates token matches & is within TTL (5 min)
│ Deletes pairing token from reverse lookup
│ Creates sess_<nanoid>, registers in reverse lookup
└─▶ returns session key
clearActiveSessionKey(userId)
Deletes session key from reverse lookup
Nulls state (daemon must re-pair on next connect)
```
---
## 5. Per-User Isolation
All gateway state is held in `LocalGatewayRegistry`, which maintains two
maps:
```
userGateways: Map<userId, UserGatewayState>
apiKeyToUserId: Map<token|sessionKey, userId> ← reverse lookup
```
`UserGatewayState` contains:
```typescript
interface UserGatewayState {
gateway: LocalGateway;
pairingToken: { token: string; createdAt: number } | null;
activeSessionKey: string | null;
disconnectTimer: ReturnType<typeof setTimeout> | null;
reconnectCount: number;
}
```
**Isolation guarantees:**
- Daemon endpoints resolve a `userId` from `validateGatewayApiKey()` and
operate exclusively on that user's `UserGatewayState`. No endpoint accepts
a `userId` from the request body.
- `getGateway(userId)` creates state lazily; `findGateway(userId)` returns
`undefined` if no state exists (used in `executeRun` to avoid allocating
state for users who have never connected).
- Pairing tokens and session keys are globally unique (`nanoid(32)`) and
never shared across users.
- `disconnectAll()` on shutdown iterates `userGateways.values()` and tears
down every gateway in isolation.
---
## 6. Tool Call Dispatch
### 6.1 Normal tool call (no confirmation required)
When the AI agent needs to invoke a local tool the call flows through
`LocalGateway`:
```mermaid
sequenceDiagram
participant A as AI Agent (Mastra tool)
participant GW as LocalGateway
participant SRV as Controller (SSE)
participant D as computer-use Daemon
A->>GW: callTool({ name, args })
GW->>GW: generate requestId, create Promise (30 s timeout)
GW->>SRV: emit "filesystem-request" via EventEmitter
SRV-->>D: SSE event: { type: "filesystem-request", payload: { requestId, toolCall } }
D->>D: execute tool locally
D->>SRV: POST /gateway/response/:requestId { result }
SRV->>GW: resolveRequest(userId, requestId, result)
GW->>GW: resolve Promise, clear timeout
GW-->>A: McpToolCallResult
```
If the daemon does not respond within 30 seconds the promise rejects and
the agent receives a tool-error event.
If the gateway disconnects while requests are pending, `LocalGateway.disconnect()`
rejects all outstanding promises immediately with `"Local gateway disconnected"`.
### 6.2 Tool call with resource-access confirmation
When a tool group operates in `Ask` mode and no stored rule matches the
resource, the daemon returns a `GATEWAY_CONFIRMATION_REQUIRED` error instead
of a result. The Mastra tool layer handles this by suspending the agent —
persisting its state to the database — and resuming it after the user
responds. This means the confirmation survives page reloads and server
restarts.
```mermaid
sequenceDiagram
participant FE as Browser (Frontend)
participant SRV as n8n Server
participant DB as Database
participant D as computer-use Daemon
Note over SRV: First invocation — tool execute() called by Mastra
SRV->>D: callTool({ name, args }) via LocalGateway
D-->>SRV: { isError: true, content: ["GATEWAY_CONFIRMATION_REQUIRED::..."] }
SRV->>SRV: parse GatewayConfirmationRequiredPayload
SRV->>DB: suspend() — persist agent snapshot + confirmation payload
SRV-->>FE: SSE confirmation-request event<br/>{ inputType: "resource-decision", resourceDecision: { resource, description, options: [...] } }
FE->>FE: show GatewayResourceDecision panel
Note over FE: User clicks a decision button (e.g. Allow for session)
FE->>SRV: POST /confirm/:requestId { approved: true, resourceDecision: "allowForSession" }
SRV->>DB: load agent snapshot, resume with resumeData
Note over SRV: Second invocation — tool execute() called with resumeData
SRV->>D: callTool({ name, args, _confirmation: "allowForSession" }) via LocalGateway
D->>D: apply decision, execute tool
D-->>SRV: { content: [...], isError: false }
SRV-->>FE: SSE tool-result / text-delta events
```
**Key properties of this design:**
- Agent state is persisted to the database on suspension — the confirmation
dialog survives page reloads and server restarts.
- The daemon returns `options` as a plain list of decision names (e.g.
`["allowOnce", "allowForSession", "alwaysAllow", "denyOnce", "alwaysDeny"]`).
The user's choice is sent back as the decision string directly — no token
indirection.
- `_confirmation` is always stripped from LLM-provided args on the first-call
path, so the agent cannot bypass the HITL flow by injecting a decision.
- If the user denies without providing a decision, `resumeData.resourceDecision`
is absent and the tool returns an access-denied error to the agent
without re-calling the daemon.
---
## 7. Disconnect & Reconnect
### Explicit disconnect (user or daemon-initiated)
`POST /gateway/disconnect`:
1. `clearDisconnectTimer(userId)` — cancels any pending grace timer.
2. `disconnectGateway(userId)` — marks gateway disconnected, rejects pending
tool calls.
3. `clearActiveSessionKey(userId)` — removes session key from reverse lookup.
The daemon must re-pair on the next connect.
4. Push notification sent to user: `instanceAiGatewayStateChanged { connected: false }`.
### Unexpected SSE drop (daemon crash / network loss)
Both sides react independently when the SSE connection drops.
**Daemon side** (`GatewayClient.connectSSE` — `onerror` handler):
1. Closes the broken `EventSource`.
2. Classifies the error:
- **Auth error** (HTTP 403 / 500) → calls `reInitialize()`: re-uploads
capabilities via `POST /gateway/init`, then reopens SSE. This handles
the case where the server restarted and lost the session key.
After 5 consecutive auth failures the daemon gives up and calls
`onPersistentFailure()`.
- **Any other error** → reopens SSE directly (session key is still valid).
3. Applies exponential backoff before each retry: `1s → 2s → 4s → … → 30s (cap)`.
4. Backoff and auth retry counter reset to zero on the next successful `onopen`.
**Server side** (`startDisconnectTimer` in `LocalGatewayRegistry`):
1. Starts a grace period before marking the gateway disconnected:
- Grace period uses exponential backoff: `min(10s × 2^reconnectCount, 120s)`
- `reconnectCount` increments each time the grace period expires.
2. If the daemon reconnects within the grace period:
- `clearDisconnectTimer(userId)` cancels the timer.
- `initGateway(userId, capabilities)` resets `reconnectCount = 0`.
3. If the grace period expires:
- `disconnectGateway(userId)` marks the gateway disconnected and rejects
pending tool calls.
- The session key is **kept** — the daemon can still re-authenticate
without re-pairing.
- `onDisconnect` fires, sending `instanceAiGatewayStateChanged { connected: false }`.
```
Server grace period:
reconnectCount: 0 1 2 3 ... n
grace period: 10 s 20 s 40 s 80 s ... 120 s (cap)
Daemon retry delay:
retry: 1 2 3 4 ... n
delay: 1 s 2 s 4 s 8 s ... 30 s (cap)
```
---
## 8. Module Settings
`InstanceAiModule.settings()` returns global (non-user-specific) values to
the frontend. Gateway connection status is **not** included because it is
per-user.
```typescript
{
enabled: boolean; // Model is configured and usable
localGateway: boolean; // Local filesystem path is configured
localGatewayDisabled: boolean; // Admin/user opt-out flag
localGatewayFallbackDirectory: string | null; // Configured fallback path
}
```
Per-user gateway state is delivered via two mechanisms:
- **Initial load** — `GET /gateway/status` (called on page mount).
- **Live updates** — targeted push notification `instanceAiGatewayStateChanged`
sent only to the affected user via `push.sendToUsers(..., [userId])`.