mirror of
https://github.com/n8n-io/n8n
synced 2026-05-03 05:27:18 +00:00
3.9 KiB
3.9 KiB
Performance Benchmarks
Microbenchmarks for measuring and tracking performance of critical code paths.
When to Use Benchmarks
Good fit:
- Hot paths executed thousands of times (expression evaluation, data transforms)
- Comparing implementation approaches (current vs proposed)
- Detecting regressions in critical code
Not a good fit:
- API endpoint latency (use load testing - k6, artillery)
- Database query performance (use query analysis tools)
- Frontend rendering (use browser profiling)
- One-off operations (startup time, migrations)
Rule of thumb: If it runs millions of times per day across all users, benchmark it.
Commands
pnpm --filter=@n8n/performance bench # Run benchmarks
pnpm --filter=@n8n/performance bench:baseline # Save new baseline
pnpm --filter=@n8n/performance bench:ci # CI check (fails if >10% slower)
Adding a Benchmark
1. Create a bench file
// benchmarks/my-feature/thing.bench.ts
import { bench, describe } from 'vitest';
describe('My Feature', () => {
bench('operation name', () => {
// Code to measure - runs thousands of times
doTheThing();
});
});
2. Add setup outside the bench function
// Setup runs once, not measured
const data = createTestData();
const instance = new MyClass();
describe('My Feature', () => {
bench('with small input', () => {
instance.process(data.small);
});
bench('with large input', () => {
instance.process(data.large);
});
});
3. Add warmup if needed
// Warmup ensures JIT compilation is done before measuring
for (let i = 0; i < 1000; i++) {
instance.process(data.small);
}
describe('My Feature', () => {
// Now benchmarks measure hot path, not JIT compilation
});
Reading Results
name hz min max mean p99 rme samples
my operation 20,000 0.04 0.20 0.05 0.10 ±0.5% 10000
| Column | Meaning |
|---|---|
| hz | Operations per second (higher = faster) |
| mean | Average time per operation in ms |
| p99 | 99th percentile - worst case latency |
| rme | Margin of error - lower = more reliable |
| samples | Number of iterations run |
Regression Detection
Benchmarks are compared against a saved baseline:
- >10% slower = regression (CI fails)
- >10% faster = improvement (consider updating baseline)
Local Workflow
# 1. Before making changes, save a baseline
pnpm --filter=@n8n/performance bench:baseline
# 2. Make your changes/refactors
# 3. Check for regressions
pnpm --filter=@n8n/performance bench:ci
After Intentional Improvements
# Save new baseline to reflect the improvement
pnpm --filter=@n8n/performance bench:baseline
Current Benchmarks
| Area | What it measures | Why it matters |
|---|---|---|
| Expression Engine | ={{ }} evaluation speed |
Runs for every node parameter |
Current Status
This is a proof-of-concept for local regression detection.
CI Integration (TODO)
Baselines are hardware-specific (an 8-core MacBook baseline is meaningless on a 2-core runner). CI needs its own baseline management:
- Option A: Store baselines as CI artifacts, restore before comparison
- Option B: External storage (S3, dedicated benchmark service)
- Option C: Compare against previous CI run on same runner type
Known Limitations
- Local noise: Background processes affect results. Run multiple times to verify.
- Baselines are machine-specific: Cannot commit baselines to git - they must be generated on the same hardware they'll be compared against.
Tips
- Keep benchmarks focused - one thing per bench, not workflows
- Use realistic data sizes - 100 items is typical, 10k is stress test
- Compare approaches - benchmark both before deciding
- Don't over-benchmark - only critical hot paths need this