mirror of
https://github.com/n8n-io/n8n
synced 2026-04-21 15:47:20 +00:00
Signed-off-by: Oleg Ivaniv <me@olegivaniv.com> Co-authored-by: Albert Alises <albert.alises@gmail.com> Co-authored-by: Jaakko Husso <jaakko@n8n.io> Co-authored-by: Dimitri Lavrenük <20122620+dlavrenuek@users.noreply.github.com> Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com> Co-authored-by: Tuukka Kantola <Tuukkaa@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Mutasem Aldmour <4711238+mutdmour@users.noreply.github.com> Co-authored-by: Raúl Gómez Morales <raul00gm@gmail.com> Co-authored-by: Elias Meire <elias@meire.dev> Co-authored-by: Dimitri Lavrenük <dimitri.lavrenuek@n8n.io> Co-authored-by: Tomi Turtiainen <10324676+tomi@users.noreply.github.com> Co-authored-by: Mutasem Aldmour <mutasem@n8n.io>
196 lines
7.7 KiB
Markdown
196 lines
7.7 KiB
Markdown
# Memory System
|
||
|
||
## Overview
|
||
|
||
The memory system serves two distinct purposes:
|
||
|
||
- **Long-term user knowledge** — working memory that persists the agent's
|
||
understanding of the user, their preferences, and instance knowledge across
|
||
all conversations (user-scoped)
|
||
- **Operational context management** — observational memory that compresses
|
||
the agent's operational history during long autonomous loops to prevent
|
||
context degradation (thread-scoped)
|
||
- **Conversation history** — recent messages and semantic recall for the
|
||
current thread (thread-scoped)
|
||
|
||
Sub-agents currently have working memory **disabled** (`workingMemoryEnabled:
|
||
false`). They are stateless — context is passed via the briefing only.
|
||
|
||
## Tiers
|
||
|
||
### Tier 1: Storage Backend
|
||
|
||
The persistence layer. Stores all messages, working memory state, observational
|
||
memory, plan state, event history, and vector embeddings.
|
||
|
||
| Backend | When Used | Connection |
|
||
|---------|-----------|------------|
|
||
| PostgreSQL | n8n is configured with `postgresdb` | Built from n8n's DB config |
|
||
| LibSQL/SQLite | All other cases (default) | `file:instance-ai-memory.db` |
|
||
|
||
The storage backend is selected automatically based on n8n's database
|
||
configuration — no separate config needed.
|
||
|
||
### Tier 2: Recent Messages
|
||
|
||
A sliding window of the most recent N messages in the conversation, sent as
|
||
context to the LLM on every request.
|
||
|
||
- **Default**: 20 messages
|
||
- **Config**: `N8N_INSTANCE_AI_LAST_MESSAGES`
|
||
|
||
### Tier 3: Working Memory
|
||
|
||
A structured markdown template that the agent can update during conversation.
|
||
It persists information the agent learns about the user and their instance
|
||
across messages. Working memory is **user-scoped** — it carries across threads.
|
||
|
||
```markdown
|
||
# User Context
|
||
- **Name**:
|
||
- **Role**:
|
||
- **Organization**:
|
||
|
||
# Workflow Preferences
|
||
- **Preferred trigger types**:
|
||
- **Common integrations used**:
|
||
- **Workflow naming conventions**:
|
||
- **Error handling patterns**:
|
||
|
||
# Current Goals
|
||
- **Active project/task**:
|
||
- **Known issues being debugged**:
|
||
- **Pending workflow changes**:
|
||
|
||
# Instance Knowledge
|
||
- **Frequently used credentials**:
|
||
- **Key workflow IDs and names**:
|
||
- **Custom node types available**:
|
||
```
|
||
|
||
The agent fills this in over time as it learns about the user. Working memory
|
||
is included in every request, giving the agent persistent context beyond the
|
||
recent message window.
|
||
|
||
### Tier 4: Observational Memory
|
||
|
||
Automatic context compression for long-running autonomous loops. Two background
|
||
agents manage the orchestrator's context size:
|
||
|
||
- **Observer** — when message tokens exceed a threshold (default: 30K), compresses
|
||
old messages into dense observations
|
||
- **Reflector** — when observations exceed their threshold (default: 40K),
|
||
condenses observations into higher-level patterns
|
||
|
||
```
|
||
Context window layout during autonomous loop:
|
||
|
||
┌──────────────────────────────────────────┐
|
||
│ Observation Block (≤40K tokens) │ ← compressed history
|
||
│ "Built wf-123 with Schedule→HTTP→Slack. │ (append-only, cacheable)
|
||
│ Exec failed: 401 on HTTP node. │
|
||
│ Debugger identified missing API key. │
|
||
│ Rebuilt workflow, re-executed, passed." │
|
||
├──────────────────────────────────────────┤
|
||
│ Raw Message Block (≤30K tokens) │ ← recent tool calls & results
|
||
│ [current step's tool calls and results] │ (rotated as new messages arrive)
|
||
└──────────────────────────────────────────┘
|
||
```
|
||
|
||
**Why this matters for the autonomous loop**:
|
||
|
||
- Tool-heavy workloads (workflow definitions, execution results, node
|
||
descriptions) get **5–40x compression** — a 50-step loop that would blow
|
||
out the context window stays manageable
|
||
- The observation block is **append-only** until reflection runs, enabling
|
||
high prompt cache hit rates (4–10x cost reduction)
|
||
- **Async buffering** pre-computes observations in the background — no
|
||
user-visible pause when the threshold is hit
|
||
- Uses a secondary LLM (default: `google/gemini-2.5-flash`) for compression —
|
||
cheap and has a 1M token context window for the Reflector
|
||
|
||
Observational memory is **thread-scoped** — it tracks the operational history
|
||
of the current task, not long-term user knowledge (that's working memory's job).
|
||
|
||
### Tier 5: Semantic Recall (Optional)
|
||
|
||
Vector-based retrieval of relevant past messages. When enabled, the system
|
||
embeds each message and retrieves semantically similar past messages to include
|
||
as context.
|
||
|
||
- **Requires**: `N8N_INSTANCE_AI_EMBEDDER_MODEL` to be set
|
||
- **Config**: `N8N_INSTANCE_AI_SEMANTIC_RECALL_TOP_K` (default: 5)
|
||
- **Message range**: 2 messages before and 1 after each match
|
||
|
||
Disabled by default. When the embedder model is not set, only tiers 1–4 are
|
||
active.
|
||
|
||
### Tier 6: Plan Storage
|
||
|
||
The `plan` tool stores execution plans in thread-scoped storage. Plans are
|
||
structured data (goal, current phase, iteration count, step statuses) that
|
||
persist across reconnects within a conversation. See the [tools](./tools.md)
|
||
documentation for the plan tool schema.
|
||
|
||
## Scoping Model
|
||
|
||
Memory is scoped to two dimensions:
|
||
|
||
```typescript
|
||
agent.stream(message, {
|
||
memory: {
|
||
resource: userId, // User-level — working memory lives here
|
||
thread: threadId, // Thread-level — messages, observations, plan live here
|
||
},
|
||
});
|
||
```
|
||
|
||
### What's user-scoped (persists across threads)
|
||
|
||
- **Working memory** — the agent's accumulated understanding of the user
|
||
(preferences, frequently used workflows, instance knowledge)
|
||
|
||
### What's thread-scoped (isolated per conversation)
|
||
|
||
- **Recent messages** — the sliding window of N messages
|
||
- **Observational memory** — compressed operational history
|
||
- **Semantic recall** — vector retrieval of relevant past messages
|
||
- **Plan** — the current execution plan
|
||
|
||
### Sub-agent memory
|
||
|
||
Sub-agents currently have working memory **disabled**. They are fully stateless —
|
||
context is passed via the briefing and `conversationContext` fields in the
|
||
`delegate` and `build-workflow-with-agent` tools.
|
||
|
||
Past failed attempts are tracked via the `IterationLog` (stored in thread
|
||
metadata) and appended to sub-agent briefings on retry, providing cross-attempt
|
||
context without persistent memory.
|
||
|
||
### Cross-user isolation
|
||
|
||
Each user's memory is fully independent. The agent cannot see other users'
|
||
conversations, working memory, or semantic history.
|
||
|
||
## Memory vs. Observational Memory
|
||
|
||
These serve different purposes and both are active simultaneously:
|
||
|
||
| Aspect | Working Memory | Observational Memory |
|
||
|--------|---------------|---------------------|
|
||
| **Scope** | User-scoped | Thread-scoped |
|
||
| **Content** | User preferences, instance knowledge | Compressed operational history |
|
||
| **Lifecycle** | Persists forever, across all threads | Lives with the conversation |
|
||
| **Updated by** | Agent (explicit writes) | Background Observer/Reflector (automatic) |
|
||
| **Example** | "User prefers Slack, uses cred-1" | "Built wf-123, exec failed, fixed HTTP auth" |
|
||
|
||
## Configuration
|
||
|
||
| Variable | Type | Default | Description |
|
||
|----------|------|---------|-------------|
|
||
| `N8N_INSTANCE_AI_LAST_MESSAGES` | number | 20 | Recent message window |
|
||
| `N8N_INSTANCE_AI_EMBEDDER_MODEL` | string | `''` | Embedder model (empty = disabled) |
|
||
| `N8N_INSTANCE_AI_SEMANTIC_RECALL_TOP_K` | number | 5 | Number of semantic matches |
|
||
| `N8N_INSTANCE_AI_OBSERVER_MODEL` | string | `google/gemini-2.5-flash` | LLM for Observer/Reflector |
|
||
| `N8N_INSTANCE_AI_OBSERVER_MESSAGE_TOKENS` | number | 30000 | Observer trigger threshold |
|
||
| `N8N_INSTANCE_AI_REFLECTOR_OBSERVATION_TOKENS` | number | 40000 | Reflector trigger threshold |
|