mirror of https://github.com/n8n-io/n8n synced 2026-05-03 05:27:18 +00:00

test(benchmark): Add expression engine performance benchmarks (no-changelog) (#24199 )

Co-authored-by: Danny Martini <danny@n8n.io>

2026-02-06 11:38:25 +00:00

3.9 KiB

Raw Blame History

Performance Benchmarks

Microbenchmarks for measuring and tracking performance of critical code paths.

When to Use Benchmarks

Good fit:

Hot paths executed thousands of times (expression evaluation, data transforms)
Comparing implementation approaches (current vs proposed)
Detecting regressions in critical code

Not a good fit:

API endpoint latency (use load testing - k6, artillery)
Database query performance (use query analysis tools)
Frontend rendering (use browser profiling)
One-off operations (startup time, migrations)

Rule of thumb: If it runs millions of times per day across all users, benchmark it.

Commands

pnpm --filter=@n8n/performance bench          # Run benchmarks
pnpm --filter=@n8n/performance bench:baseline # Save new baseline
pnpm --filter=@n8n/performance bench:ci       # CI check (fails if >10% slower)

Adding a Benchmark

1. Create a bench file

// benchmarks/my-feature/thing.bench.ts
import { bench, describe } from 'vitest';

describe('My Feature', () => {
  bench('operation name', () => {
    // Code to measure - runs thousands of times
    doTheThing();
  });
});

2. Add setup outside the bench function

// Setup runs once, not measured
const data = createTestData();
const instance = new MyClass();

describe('My Feature', () => {
  bench('with small input', () => {
    instance.process(data.small);
  });

  bench('with large input', () => {
    instance.process(data.large);
  });
});

3. Add warmup if needed

// Warmup ensures JIT compilation is done before measuring
for (let i = 0; i < 1000; i++) {
  instance.process(data.small);
}

describe('My Feature', () => {
  // Now benchmarks measure hot path, not JIT compilation
});

Reading Results

name                hz      min    max   mean    p99    rme   samples
my operation    20,000   0.04   0.20   0.05   0.10  ±0.5%   10000

Column	Meaning
hz	Operations per second (higher = faster)
mean	Average time per operation in ms
p99	99th percentile - worst case latency
rme	Margin of error - lower = more reliable
samples	Number of iterations run

Regression Detection

Benchmarks are compared against a saved baseline:

>10% slower = regression (CI fails)
>10% faster = improvement (consider updating baseline)

Local Workflow

# 1. Before making changes, save a baseline
pnpm --filter=@n8n/performance bench:baseline

# 2. Make your changes/refactors

# 3. Check for regressions
pnpm --filter=@n8n/performance bench:ci

After Intentional Improvements

# Save new baseline to reflect the improvement
pnpm --filter=@n8n/performance bench:baseline

Current Benchmarks

Area	What it measures	Why it matters
Expression Engine	`={{ }}` evaluation speed	Runs for every node parameter

Current Status

This is a proof-of-concept for local regression detection.

CI Integration (TODO)

Baselines are hardware-specific (an 8-core MacBook baseline is meaningless on a 2-core runner). CI needs its own baseline management:

Option A: Store baselines as CI artifacts, restore before comparison
Option B: External storage (S3, dedicated benchmark service)
Option C: Compare against previous CI run on same runner type

Known Limitations

Local noise: Background processes affect results. Run multiple times to verify.
Baselines are machine-specific: Cannot commit baselines to git - they must be generated on the same hardware they'll be compared against.

Tips

Keep benchmarks focused - one thing per bench, not workflows
Use realistic data sizes - 100 items is typical, 10k is stress test
Compare approaches - benchmark both before deciding
Don't over-benchmark - only critical hot paths need this

3.9 KiB Raw Blame History