n8n/packages/testing/performance/README.md
Declan Carroll a33f33d3d9
test(benchmark): Add expression engine performance benchmarks (no-changelog) (#24199)
Co-authored-by: Danny Martini <danny@n8n.io>
2026-02-06 11:38:25 +00:00

3.9 KiB

Performance Benchmarks

Microbenchmarks for measuring and tracking performance of critical code paths.

When to Use Benchmarks

Good fit:

  • Hot paths executed thousands of times (expression evaluation, data transforms)
  • Comparing implementation approaches (current vs proposed)
  • Detecting regressions in critical code

Not a good fit:

  • API endpoint latency (use load testing - k6, artillery)
  • Database query performance (use query analysis tools)
  • Frontend rendering (use browser profiling)
  • One-off operations (startup time, migrations)

Rule of thumb: If it runs millions of times per day across all users, benchmark it.

Commands

pnpm --filter=@n8n/performance bench          # Run benchmarks
pnpm --filter=@n8n/performance bench:baseline # Save new baseline
pnpm --filter=@n8n/performance bench:ci       # CI check (fails if >10% slower)

Adding a Benchmark

1. Create a bench file

// benchmarks/my-feature/thing.bench.ts
import { bench, describe } from 'vitest';

describe('My Feature', () => {
  bench('operation name', () => {
    // Code to measure - runs thousands of times
    doTheThing();
  });
});

2. Add setup outside the bench function

// Setup runs once, not measured
const data = createTestData();
const instance = new MyClass();

describe('My Feature', () => {
  bench('with small input', () => {
    instance.process(data.small);
  });

  bench('with large input', () => {
    instance.process(data.large);
  });
});

3. Add warmup if needed

// Warmup ensures JIT compilation is done before measuring
for (let i = 0; i < 1000; i++) {
  instance.process(data.small);
}

describe('My Feature', () => {
  // Now benchmarks measure hot path, not JIT compilation
});

Reading Results

name                hz      min    max   mean    p99    rme   samples
my operation    20,000   0.04   0.20   0.05   0.10  ±0.5%   10000
Column Meaning
hz Operations per second (higher = faster)
mean Average time per operation in ms
p99 99th percentile - worst case latency
rme Margin of error - lower = more reliable
samples Number of iterations run

Regression Detection

Benchmarks are compared against a saved baseline:

  • >10% slower = regression (CI fails)
  • >10% faster = improvement (consider updating baseline)

Local Workflow

# 1. Before making changes, save a baseline
pnpm --filter=@n8n/performance bench:baseline

# 2. Make your changes/refactors

# 3. Check for regressions
pnpm --filter=@n8n/performance bench:ci

After Intentional Improvements

# Save new baseline to reflect the improvement
pnpm --filter=@n8n/performance bench:baseline

Current Benchmarks

Area What it measures Why it matters
Expression Engine ={{ }} evaluation speed Runs for every node parameter

Current Status

This is a proof-of-concept for local regression detection.

CI Integration (TODO)

Baselines are hardware-specific (an 8-core MacBook baseline is meaningless on a 2-core runner). CI needs its own baseline management:

  • Option A: Store baselines as CI artifacts, restore before comparison
  • Option B: External storage (S3, dedicated benchmark service)
  • Option C: Compare against previous CI run on same runner type

Known Limitations

  • Local noise: Background processes affect results. Run multiple times to verify.
  • Baselines are machine-specific: Cannot commit baselines to git - they must be generated on the same hardware they'll be compared against.

Tips

  1. Keep benchmarks focused - one thing per bench, not workflows
  2. Use realistic data sizes - 100 items is typical, 10k is stress test
  3. Compare approaches - benchmark both before deciding
  4. Don't over-benchmark - only critical hot paths need this