mirror of https://github.com/n8n-io/n8n synced 2026-04-21 15:47:20 +00:00

feat: Instance AI and local gateway modules (no-changelog) (#27206 )

Signed-off-by: Oleg Ivaniv <me@olegivaniv.com>
Co-authored-by: Albert Alises <albert.alises@gmail.com>
Co-authored-by: Jaakko Husso <jaakko@n8n.io>
Co-authored-by: Dimitri Lavrenük <20122620+dlavrenuek@users.noreply.github.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Co-authored-by: Tuukka Kantola <Tuukkaa@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Mutasem Aldmour <4711238+mutdmour@users.noreply.github.com>
Co-authored-by: Raúl Gómez Morales <raul00gm@gmail.com>
Co-authored-by: Elias Meire <elias@meire.dev>
Co-authored-by: Dimitri Lavrenük <dimitri.lavrenuek@n8n.io>
Co-authored-by: Tomi Turtiainen <10324676+tomi@users.noreply.github.com>
Co-authored-by: Mutasem Aldmour <mutasem@n8n.io>

2026-04-01 21:33:38 +03:00

7.7 KiB

Raw Blame History

Memory System

Overview

The memory system serves two distinct purposes:

Long-term user knowledge — working memory that persists the agent's understanding of the user, their preferences, and instance knowledge across all conversations (user-scoped)
Operational context management — observational memory that compresses the agent's operational history during long autonomous loops to prevent context degradation (thread-scoped)
Conversation history — recent messages and semantic recall for the current thread (thread-scoped)

Sub-agents currently have working memory disabled (workingMemoryEnabled: false). They are stateless — context is passed via the briefing only.

Tiers

Tier 1: Storage Backend

The persistence layer. Stores all messages, working memory state, observational memory, plan state, event history, and vector embeddings.

Backend	When Used	Connection
PostgreSQL	n8n is configured with `postgresdb`	Built from n8n's DB config
LibSQL/SQLite	All other cases (default)	`file:instance-ai-memory.db`

The storage backend is selected automatically based on n8n's database configuration — no separate config needed.

Tier 2: Recent Messages

A sliding window of the most recent N messages in the conversation, sent as context to the LLM on every request.

Default: 20 messages
Config: N8N_INSTANCE_AI_LAST_MESSAGES

Tier 3: Working Memory

A structured markdown template that the agent can update during conversation. It persists information the agent learns about the user and their instance across messages. Working memory is user-scoped — it carries across threads.

# User Context
- **Name**:
- **Role**:
- **Organization**:

# Workflow Preferences
- **Preferred trigger types**:
- **Common integrations used**:
- **Workflow naming conventions**:
- **Error handling patterns**:

# Current Goals
- **Active project/task**:
- **Known issues being debugged**:
- **Pending workflow changes**:

# Instance Knowledge
- **Frequently used credentials**:
- **Key workflow IDs and names**:
- **Custom node types available**:

The agent fills this in over time as it learns about the user. Working memory is included in every request, giving the agent persistent context beyond the recent message window.

Tier 4: Observational Memory

Automatic context compression for long-running autonomous loops. Two background agents manage the orchestrator's context size:

Observer — when message tokens exceed a threshold (default: 30K), compresses old messages into dense observations
Reflector — when observations exceed their threshold (default: 40K), condenses observations into higher-level patterns

Context window layout during autonomous loop:

┌──────────────────────────────────────────┐
│ Observation Block (≤40K tokens)          │  ← compressed history
│ "Built wf-123 with Schedule→HTTP→Slack.  │     (append-only, cacheable)
│  Exec failed: 401 on HTTP node.          │
│  Debugger identified missing API key.    │
│  Rebuilt workflow, re-executed, passed."  │
├──────────────────────────────────────────┤
│ Raw Message Block (≤30K tokens)          │  ← recent tool calls & results
│ [current step's tool calls and results]  │     (rotated as new messages arrive)
└──────────────────────────────────────────┘

Why this matters for the autonomous loop:

Tool-heavy workloads (workflow definitions, execution results, node descriptions) get 5–40x compression — a 50-step loop that would blow out the context window stays manageable
The observation block is append-only until reflection runs, enabling high prompt cache hit rates (4–10x cost reduction)
Async buffering pre-computes observations in the background — no user-visible pause when the threshold is hit
Uses a secondary LLM (default: google/gemini-2.5-flash) for compression — cheap and has a 1M token context window for the Reflector

Observational memory is thread-scoped — it tracks the operational history of the current task, not long-term user knowledge (that's working memory's job).

Tier 5: Semantic Recall (Optional)

Vector-based retrieval of relevant past messages. When enabled, the system embeds each message and retrieves semantically similar past messages to include as context.

Requires: N8N_INSTANCE_AI_EMBEDDER_MODEL to be set
Config: N8N_INSTANCE_AI_SEMANTIC_RECALL_TOP_K (default: 5)
Message range: 2 messages before and 1 after each match

Disabled by default. When the embedder model is not set, only tiers 1–4 are active.

Tier 6: Plan Storage

The plan tool stores execution plans in thread-scoped storage. Plans are structured data (goal, current phase, iteration count, step statuses) that persist across reconnects within a conversation. See the tools documentation for the plan tool schema.

Scoping Model

Memory is scoped to two dimensions:

agent.stream(message, {
  memory: {
    resource: userId,    // User-level — working memory lives here
    thread: threadId,    // Thread-level — messages, observations, plan live here
  },
});

What's user-scoped (persists across threads)

Working memory — the agent's accumulated understanding of the user (preferences, frequently used workflows, instance knowledge)

What's thread-scoped (isolated per conversation)

Recent messages — the sliding window of N messages
Observational memory — compressed operational history
Semantic recall — vector retrieval of relevant past messages
Plan — the current execution plan

Sub-agent memory

Sub-agents currently have working memory disabled. They are fully stateless — context is passed via the briefing and conversationContext fields in the delegate and build-workflow-with-agent tools.

Past failed attempts are tracked via the IterationLog (stored in thread metadata) and appended to sub-agent briefings on retry, providing cross-attempt context without persistent memory.

Cross-user isolation

Each user's memory is fully independent. The agent cannot see other users' conversations, working memory, or semantic history.

Memory vs. Observational Memory

These serve different purposes and both are active simultaneously:

Aspect	Working Memory	Observational Memory
Scope	User-scoped	Thread-scoped
Content	User preferences, instance knowledge	Compressed operational history
Lifecycle	Persists forever, across all threads	Lives with the conversation
Updated by	Agent (explicit writes)	Background Observer/Reflector (automatic)
Example	"User prefers Slack, uses cred-1"	"Built wf-123, exec failed, fixed HTTP auth"

Configuration

Variable	Type	Default	Description
`N8N_INSTANCE_AI_LAST_MESSAGES`	number	20	Recent message window
`N8N_INSTANCE_AI_EMBEDDER_MODEL`	string	`''`	Embedder model (empty = disabled)
`N8N_INSTANCE_AI_SEMANTIC_RECALL_TOP_K`	number	5	Number of semantic matches
`N8N_INSTANCE_AI_OBSERVER_MODEL`	string	`google/gemini-2.5-flash`	LLM for Observer/Reflector
`N8N_INSTANCE_AI_OBSERVER_MESSAGE_TOKENS`	number	30000	Observer trigger threshold
`N8N_INSTANCE_AI_REFLECTOR_OBSERVATION_TOKENS`	number	40000	Reflector trigger threshold

7.7 KiB Raw Blame History Unescape Escape