# Performance Benchmarks Microbenchmarks for measuring and tracking performance of critical code paths. ## When to Use Benchmarks **Good fit:** - Hot paths executed thousands of times (expression evaluation, data transforms) - Comparing implementation approaches (current vs proposed) - Detecting regressions in critical code **Not a good fit:** - API endpoint latency (use load testing - k6, artillery) - Database query performance (use query analysis tools) - Frontend rendering (use browser profiling) - One-off operations (startup time, migrations) **Rule of thumb:** If it runs millions of times per day across all users, benchmark it. ## Commands ```bash pnpm --filter=@n8n/performance bench # Run benchmarks pnpm --filter=@n8n/performance bench:baseline # Save new baseline pnpm --filter=@n8n/performance bench:ci # CI check (fails if >10% slower) ``` ## Adding a Benchmark ### 1. Create a bench file ```typescript // benchmarks/my-feature/thing.bench.ts import { bench, describe } from 'vitest'; describe('My Feature', () => { bench('operation name', () => { // Code to measure - runs thousands of times doTheThing(); }); }); ``` ### 2. Add setup outside the bench function ```typescript // Setup runs once, not measured const data = createTestData(); const instance = new MyClass(); describe('My Feature', () => { bench('with small input', () => { instance.process(data.small); }); bench('with large input', () => { instance.process(data.large); }); }); ``` ### 3. Add warmup if needed ```typescript // Warmup ensures JIT compilation is done before measuring for (let i = 0; i < 1000; i++) { instance.process(data.small); } describe('My Feature', () => { // Now benchmarks measure hot path, not JIT compilation }); ``` ## Reading Results ``` name hz min max mean p99 rme samples my operation 20,000 0.04 0.20 0.05 0.10 ±0.5% 10000 ``` | Column | Meaning | |--------|---------| | hz | Operations per second (higher = faster) | | mean | Average time per operation in ms | | p99 | 99th percentile - worst case latency | | rme | Margin of error - lower = more reliable | | samples | Number of iterations run | ## Regression Detection Benchmarks are compared against a saved baseline: - **>10% slower** = regression (CI fails) - **>10% faster** = improvement (consider updating baseline) ### Local Workflow ```bash # 1. Before making changes, save a baseline pnpm --filter=@n8n/performance bench:baseline # 2. Make your changes/refactors # 3. Check for regressions pnpm --filter=@n8n/performance bench:ci ``` ### After Intentional Improvements ```bash # Save new baseline to reflect the improvement pnpm --filter=@n8n/performance bench:baseline ``` ## Current Benchmarks | Area | What it measures | Why it matters | |------|------------------|----------------| | Expression Engine | `={{ }}` evaluation speed | Runs for every node parameter | ## Current Status This is a proof-of-concept for local regression detection. ### CI Integration (TODO) Baselines are hardware-specific (an 8-core MacBook baseline is meaningless on a 2-core runner). CI needs its own baseline management: - **Option A:** Store baselines as CI artifacts, restore before comparison - **Option B:** External storage (S3, dedicated benchmark service) - **Option C:** Compare against previous CI run on same runner type ## Known Limitations - **Local noise**: Background processes affect results. Run multiple times to verify. - **Baselines are machine-specific**: Cannot commit baselines to git - they must be generated on the same hardware they'll be compared against. ## Tips 1. **Keep benchmarks focused** - one thing per bench, not workflows 2. **Use realistic data sizes** - 100 items is typical, 10k is stress test 3. **Compare approaches** - benchmark both before deciding 4. **Don't over-benchmark** - only critical hot paths need this