n8n/packages/testing/performance/README.md

# Performance Benchmarks

Microbenchmarks for measuring and tracking performance of critical code paths.

## When to Use Benchmarks

**Good fit:**
- Hot paths executed thousands of times (expression evaluation, data transforms)
- Comparing implementation approaches (current vs proposed)
- Detecting regressions in critical code

**Not a good fit:**
- API endpoint latency (use load testing - k6, artillery)
- Database query performance (use query analysis tools)
- Frontend rendering (use browser profiling)
- One-off operations (startup time, migrations)

**Rule of thumb:** If it runs millions of times per day across all users, benchmark it.

## Commands

```bash
pnpm --filter=@n8n/performance bench          # Run benchmarks
pnpm --filter=@n8n/performance bench:baseline # Save new baseline
pnpm --filter=@n8n/performance bench:ci       # CI check (fails if >10% slower)
```

## Adding a Benchmark

### 1. Create a bench file

```typescript
// benchmarks/my-feature/thing.bench.ts
import { bench, describe } from 'vitest';

describe('My Feature', () => {
  bench('operation name', () => {
    // Code to measure - runs thousands of times
    doTheThing();
  });
});
```

### 2. Add setup outside the bench function

```typescript
// Setup runs once, not measured
const data = createTestData();
const instance = new MyClass();

describe('My Feature', () => {
  bench('with small input', () => {
    instance.process(data.small);
  });

  bench('with large input', () => {
    instance.process(data.large);
  });
});
```

### 3. Add warmup if needed

```typescript
// Warmup ensures JIT compilation is done before measuring
for (let i = 0; i < 1000; i++) {
  instance.process(data.small);
}

describe('My Feature', () => {
  // Now benchmarks measure hot path, not JIT compilation
});
```

## Reading Results

```
name                hz      min    max   mean    p99    rme   samples
my operation    20,000   0.04   0.20   0.05   0.10  ±0.5%   10000
```

| Column | Meaning |
|--------|---------|
| hz | Operations per second (higher = faster) |
| mean | Average time per operation in ms |
| p99 | 99th percentile - worst case latency |
| rme | Margin of error - lower = more reliable |
| samples | Number of iterations run |

## Regression Detection

Benchmarks are compared against a saved baseline:

- **>10% slower** = regression (CI fails)
- **>10% faster** = improvement (consider updating baseline)

### Local Workflow

```bash
# 1. Before making changes, save a baseline
pnpm --filter=@n8n/performance bench:baseline

# 2. Make your changes/refactors

# 3. Check for regressions
pnpm --filter=@n8n/performance bench:ci
```

### After Intentional Improvements

```bash
# Save new baseline to reflect the improvement
pnpm --filter=@n8n/performance bench:baseline
```

## Current Benchmarks

| Area | What it measures | Why it matters |
|------|------------------|----------------|
| Expression Engine | `={{ }}` evaluation speed | Runs for every node parameter |

## Current Status

This is a proof-of-concept for local regression detection.

### CI Integration (TODO)

Baselines are hardware-specific (an 8-core MacBook baseline is meaningless on a 2-core runner). CI needs its own baseline management:

- **Option A:** Store baselines as CI artifacts, restore before comparison
- **Option B:** External storage (S3, dedicated benchmark service)
- **Option C:** Compare against previous CI run on same runner type

## Known Limitations

- **Local noise**: Background processes affect results. Run multiple times to verify.
- **Baselines are machine-specific**: Cannot commit baselines to git - they must be generated on the same hardware they'll be compared against.

## Tips

1. **Keep benchmarks focused** - one thing per bench, not workflows
2. **Use realistic data sizes** - 100 items is typical, 10k is stress test
3. **Compare approaches** - benchmark both before deciding
4. **Don't over-benchmark** - only critical hot paths need this
test(benchmark): Add expression engine performance benchmarks (no-changelog) (#24199) Co-authored-by: Danny Martini <danny@n8n.io> 2026-02-06 11:38:25 +00:00			`# Performance Benchmarks`

			`Microbenchmarks for measuring and tracking performance of critical code paths.`

			`## When to Use Benchmarks`

			`Good fit:`
			`- Hot paths executed thousands of times (expression evaluation, data transforms)`
			`- Comparing implementation approaches (current vs proposed)`
			`- Detecting regressions in critical code`

			`Not a good fit:`
			`- API endpoint latency (use load testing - k6, artillery)`
			`- Database query performance (use query analysis tools)`
			`- Frontend rendering (use browser profiling)`
			`- One-off operations (startup time, migrations)`

			`Rule of thumb: If it runs millions of times per day across all users, benchmark it.`

			`## Commands`

			```bash
			`pnpm --filter=@n8n/performance bench # Run benchmarks`
			`pnpm --filter=@n8n/performance bench:baseline # Save new baseline`
			`pnpm --filter=@n8n/performance bench:ci # CI check (fails if >10% slower)`
			```

			`## Adding a Benchmark`

			`### 1. Create a bench file`

			```typescript
			`// benchmarks/my-feature/thing.bench.ts`
			`import { bench, describe } from 'vitest';`

			`describe('My Feature', () => {`
			`bench('operation name', () => {`
			`// Code to measure - runs thousands of times`
			`doTheThing();`
			`});`
			`});`
			```

			`### 2. Add setup outside the bench function`

			```typescript
			`// Setup runs once, not measured`
			`const data = createTestData();`
			`const instance = new MyClass();`

			`describe('My Feature', () => {`
			`bench('with small input', () => {`
			`instance.process(data.small);`
			`});`

			`bench('with large input', () => {`
			`instance.process(data.large);`
			`});`
			`});`
			```

			`### 3. Add warmup if needed`

			```typescript
			`// Warmup ensures JIT compilation is done before measuring`
			`for (let i = 0; i < 1000; i++) {`
			`instance.process(data.small);`
			`}`

			`describe('My Feature', () => {`
			`// Now benchmarks measure hot path, not JIT compilation`
			`});`
			```

			`## Reading Results`

			```
			`name hz min max mean p99 rme samples`
			`my operation 20,000 0.04 0.20 0.05 0.10 ±0.5% 10000`
			```

			`\| Column \| Meaning \|`
			`\|--------\|---------\|`
			`\| hz \| Operations per second (higher = faster) \|`
			`\| mean \| Average time per operation in ms \|`
			`\| p99 \| 99th percentile - worst case latency \|`
			`\| rme \| Margin of error - lower = more reliable \|`
			`\| samples \| Number of iterations run \|`

			`## Regression Detection`

			`Benchmarks are compared against a saved baseline:`

			`- >10% slower = regression (CI fails)`
			`- >10% faster = improvement (consider updating baseline)`

			`### Local Workflow`

			```bash
			`# 1. Before making changes, save a baseline`
			`pnpm --filter=@n8n/performance bench:baseline`

			`# 2. Make your changes/refactors`

			`# 3. Check for regressions`
			`pnpm --filter=@n8n/performance bench:ci`
			```

			`### After Intentional Improvements`

			```bash
			`# Save new baseline to reflect the improvement`
			`pnpm --filter=@n8n/performance bench:baseline`
			```

			`## Current Benchmarks`

			`\| Area \| What it measures \| Why it matters \|`
			`\|------\|------------------\|----------------\|`
			\| Expression Engine \| `={{ }}` evaluation speed \| Runs for every node parameter \|

			`## Current Status`

			`This is a proof-of-concept for local regression detection.`

			`### CI Integration (TODO)`

			`Baselines are hardware-specific (an 8-core MacBook baseline is meaningless on a 2-core runner). CI needs its own baseline management:`

			`- Option A: Store baselines as CI artifacts, restore before comparison`
			`- Option B: External storage (S3, dedicated benchmark service)`
			`- Option C: Compare against previous CI run on same runner type`

			`## Known Limitations`

			`- Local noise: Background processes affect results. Run multiple times to verify.`
			`- Baselines are machine-specific: Cannot commit baselines to git - they must be generated on the same hardware they'll be compared against.`

			`## Tips`

			`1. Keep benchmarks focused - one thing per bench, not workflows`
			`2. Use realistic data sizes - 100 items is typical, 10k is stress test`
			`3. Compare approaches - benchmark both before deciding`
			`4. Don't over-benchmark - only critical hot paths need this`