n8n/packages/@n8n/instance-ai/docs/ENGINEERING.md
oleg 629826ca1d
feat: Instance AI and local gateway modules (no-changelog) (#27206)
Signed-off-by: Oleg Ivaniv <me@olegivaniv.com>
Co-authored-by: Albert Alises <albert.alises@gmail.com>
Co-authored-by: Jaakko Husso <jaakko@n8n.io>
Co-authored-by: Dimitri Lavrenük <20122620+dlavrenuek@users.noreply.github.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Co-authored-by: Tuukka Kantola <Tuukkaa@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Mutasem Aldmour <4711238+mutdmour@users.noreply.github.com>
Co-authored-by: Raúl Gómez Morales <raul00gm@gmail.com>
Co-authored-by: Elias Meire <elias@meire.dev>
Co-authored-by: Dimitri Lavrenük <dimitri.lavrenuek@n8n.io>
Co-authored-by: Tomi Turtiainen <10324676+tomi@users.noreply.github.com>
Co-authored-by: Mutasem Aldmour <mutasem@n8n.io>
2026-04-01 21:33:38 +03:00

12 KiB

Engineering Standards

Concrete standards for Instance AI development. Every implementation ticket should follow these. When reviewing code, check against this list.

TypeScript

No escape hatches

Events flow from backend agents through the event bus to the frontend store and renderer. A single any or as cast breaks the chain — the compiler can no longer verify that every event type is handled everywhere. Strict typing means adding a new event type produces compile errors at every unhandled switch, not silent runtime bugs.

// NEVER
const result: any = await agent.stream(msg);
const data = response as ExecutionResult;

// INSTEAD — use the type system
const result: StreamResult<InstanceAiEvent> = await agent.stream(msg);
const data: ExecutionResult = parseExecutionResult(response);
  • No any — use unknown + type narrowing if the type is truly unknown
  • No as casts — use type guards, discriminated unions, or satisfies
  • Exhaustive switches for unions — the compiler catches missing cases

Zod schemas are the source of truth

Every tool has an input schema (what the LLM sends) and an output schema (what the tool returns). Mastra uses these schemas to generate tool descriptions for the LLM, validate inputs at runtime, and type-check the execute function. If the TypeScript type and the Zod schema are defined separately, they drift — the LLM sees one contract, the code enforces another, and bugs hide until production.

// NEVER — separate schema and type that can drift
interface ListWorkflowsInput { query?: string; limit?: number; }
const schema = z.object({ query: z.string().optional(), limit: z.number().optional() });

// INSTEAD — infer the type from the schema
const listWorkflowsInputSchema = z.object({
  query: z.string().optional(),
  limit: z.number().int().min(1).max(100).default(50),
});
type ListWorkflowsInput = z.infer<typeof listWorkflowsInputSchema>;

This applies to tool schemas, event payloads, API request/response bodies, and plan state.

Discriminated unions for events

Each event type has a different payload shape. Discriminated unions let the compiler narrow the payload inside each case — no runtime checks, no possibility of accessing the wrong field. Adding a new event type to the union turns every unhandled switch into a compile error.

case 'text-delta':
  node.textContent += event.payload.text; // ← compiler knows this is string
  break;

case 'tool-call':
  node.toolCalls.push({
    toolCallId: event.payload.toolCallId, // ← compiler knows this is string
    toolName: event.payload.toolName,
    ...
  });
  break;

Branded types for IDs

The event system passes runId, agentId, threadId, and toolCallId through the same functions — all strings. Branded types make the compiler catch swapped arguments that would otherwise be silent wrong-lookup bugs.

type RunId = string & { readonly __brand: 'RunId' };
type AgentId = string & { readonly __brand: 'AgentId' };
type ThreadId = string & { readonly __brand: 'ThreadId' };
type ToolCallId = string & { readonly __brand: 'ToolCallId' };

// Compiler prevents: findMessageByRunId(state, agentId)

Optional but valuable where multiple ID strings flow through the same code.

Testing

Test behavior, not implementation

The deep agent architecture will evolve rapidly — sub-agent mechanics, event bus internals, and reducer logic will change as we learn. Tests that assert on internal method calls break on every refactor. Tests that assert on observable outcomes survive refactors and catch real regressions.

// BAD — breaks when internals change
it('should call eventBus.publish with the right args', () => {
  expect(eventBus.publish).toHaveBeenCalledWith('thread-1', {
    type: 'tool-call', agentId: 'a1', ...
  });
});

// GOOD — tests what the user/frontend actually sees
it('should stream tool-call event when agent uses a tool', async () => {
  const events = await collectEvents(agent.stream('list my workflows'));
  const toolCall = events.find(e => e.type === 'tool-call');
  expect(toolCall).toBeDefined();
  expect(toolCall!.payload.toolName).toBe('list-workflows');
});

Test the contract, not the internals

The clean interface boundary (ADR-002) makes each layer testable in isolation. Verify the contract at each boundary — not the wiring between them. Tools can be tested without Mastra, the reducer without SSE, adapters without the agent.

For each tool, test:

  • Valid input → expected output shape
  • Invalid input → Zod validation error
  • Service method called with correct args (verify the interface boundary)
  • Error from service → tool error propagated correctly

For the event reducer, test:

  • Each event type mutates state correctly
  • Event ordering edge cases (e.g., tool-result before tool-call)
  • Mid-run replay creates placeholder correctly
  • Thread switch clears and replays

Test edge cases that matter

The autonomous loop introduces failure modes that don't exist in simple request/response systems. Write tests for the scenarios that would be hardest to debug after the fact.

it('should handle run-finish after connection drop and reconnect', ...);
it('should not lose events when sub-agent completes during page reload', ...);
it('should reject delegate with MCP tool names', ...);
it('should not leak credentials in tool-call args for credential tools', ...);

No snapshot tests for dynamic data

Agent responses contain timestamps, generated IDs, and non-deterministic ordering. Snapshots against this data break constantly and get bulk-updated without review — they stop catching bugs. Use structural assertions that verify the shape and relationships you care about.

// BAD
expect(agentTree).toMatchSnapshot();

// GOOD
expect(agentTree.children).toHaveLength(1);
expect(agentTree.children[0].role).toBe('workflow builder');
expect(agentTree.children[0].status).toBe('completed');

DRY

Single source of truth

The same concepts (event types, tool schemas, replay rules) are used by backend, frontend, docs, and tickets. If a definition exists in two places, they diverge — we've already caught this multiple times during doc reviews. One canonical location per concept, everything else imports or references it.

Concept Source of truth Consumers
Event types @n8n/api-types TypeScript unions Backend, frontend, docs
Tool schemas Zod schemas in src/tools/ Agent, tests, docs
Plan schema Zod schema in src/tools/orchestration/ Agent, frontend, docs
Config vars @n8n/config class Backend, docs
Replay rule streaming-protocol.md canonical table Frontend, backend, tickets

Shared types in @n8n/api-types

Frontend and backend are separate packages but must agree on event shapes, API types, and status enums. Separate definitions drift silently — the backend emits status: "cancelled" while the frontend checks status: "canceled". Shared types make this a compile error.

// @n8n/api-types — single definition
export type InstanceAiEvent = RunStartEvent | RunFinishEvent | ...;

// Both sides import the same type
import type { InstanceAiEvent } from '@n8n/api-types';

Avoid parallel hierarchies

When backend and frontend both switch on event types with duplicated logic, a change to the format requires updating both in lockstep. Extract the shared part into @n8n/api-types or a shared utility.

Mastra Patterns

Tool definitions

Mastra uses Zod schemas for both runtime validation and LLM tool descriptions. The .describe() strings on schema fields become the parameter descriptions the LLM sees when deciding how to call a tool. Missing or vague descriptions lead to bad tool calls. The outputSchema lets Mastra validate return values and gives the LLM structured expectations.

  • Always define both inputSchema and outputSchema
  • Use .describe() on Zod fields — these are the LLM's parameter docs
  • Capture service context via closure in the factory function, not globals
  • Keep execute focused — delegate to service methods, no business logic in tools
export function createListWorkflowsTool(context: InstanceAiContext) {
  return createTool({
    id: 'list-workflows',
    description: 'List workflows accessible to the current user.',
    inputSchema: z.object({
      query: z.string().optional().describe('Filter workflows by name'),
      limit: z.number().int().min(1).max(100).default(50).describe('Max results'),
    }),
    outputSchema: z.object({
      workflows: z.array(workflowSummarySchema),
    }),
    execute: async ({ query, limit }) => {
      const workflows = await context.workflowService.list({ query, limit });
      return { workflows };
    },
  });
}

Memory usage

The memory system has distinct scopes with different lifecycles. Mixing them causes subtle bugs: storing a plan in working memory leaks it across conversations, writing observations from a sub-agent corrupts the orchestrator's context, manually summarizing tool results fights with the Observer doing the same thing.

  • Working memory is for user-scoped knowledge — not operational state
  • Never read/write memory from sub-agents — they're stateless by design
  • Let observational memory handle compression — don't manually summarize

Agent creation

Each request has its own user context (permissions, MCP config). Caching agents across requests risks serving wrong permissions. Sub-agents with the full tool set can call tools the orchestrator didn't intend — the minimal tool set is both a security boundary and context optimization.

  • Agent per request (ADR-003) — don't cache agent instances
  • Pass all context via the factory function — no ambient globals
  • Sub-agents get the minimum tool set needed

Abstractions

Right level of abstraction

The clean interface boundary (ADR-002) keeps the agent core free of n8n dependencies — testable in isolation and potentially reusable outside n8n. Skipping a layer breaks testability. Adding an unnecessary layer adds indirection without value.

Tool (thin wrapper)  →  Service interface  →  Adapter (n8n bridge)  →  n8n internals
     Zod schemas          Pure TypeScript        DI + permissions        Framework-specific
  • Tools — validate input, call service, return output
  • Service interfaces — pure TypeScript, no n8n imports
  • Adapters — permissions, data transformation, error mapping
  • Don't skip layers, don't add unnecessary ones

Abstract over transport, not around it

n8n runs single instance (in-process) and queue mode (Redis). The same agent code must work in both without knowing which. If the interface leaks transport details, every event publisher needs Redis knowledge and testing locally requires a Redis dependency. Domain-level interfaces keep agent code portable and tests simple.

// GOOD — domain-level
publish(threadId: string, event: InstanceAiEvent): void;
subscribe(threadId: string, handler: (event: InstanceAiEvent) => void): Unsubscribe;

// BAD — transport leaked
publish(channel: string, message: string): void;
subscribe(channel: string, callback: (channel: string, message: string) => void): void;

Don't abstract prematurely

This project is built with AI tools, which tend to over-abstract. The autonomous loop design is still evolving — a premature abstraction becomes a constraint rather than an enabler.

  • Three similar lines is better than a premature helper
  • Don't extract until the pattern repeats 3+ times
  • Don't wrap framework primitives before the API is stable
  • Let patterns emerge from implementation, then extract

Standard Acceptance Criteria

Every implementation ticket should include these in addition to its feature-specific ACs:

## Standard ACs (all tickets)

- [ ] No `any` types or `as` casts in new code
- [ ] Types inferred from Zod schemas where applicable
- [ ] Tests cover behavior (not implementation), including edge cases
- [ ] No type/schema duplication — shared definitions in `@n8n/api-types`
- [ ] Typecheck passes (`pnpm typecheck` in package directory)
- [ ] Lint passes (`pnpm lint` in package directory)