Skip to content

Benchmark

Measures queue overhead and, more importantly, validates the weighted tenant-fairness claim with a direct correctness check. The benchmark asserts that dispatch ratios across three tenants with weights 3:2:1 match the ideal ratio within ±10%.

What We Measure

BenchmarkDescription
A) Baseline: direct async function callThe floor you're replacing — no queue at all
B) enqueue() — no drainPer-call cost of handing a job to the backend
C) enqueue + drain, 1 job — end-to-end latencySingle-job latency from enqueue to handler completion
D) Throughput: 5000 jobs, 1 tenant (FIFO)Single-queue drain cost — exposes Array.shift() O(N)
E) Throughput: 5000 jobs, 100 tenants equal weightPartitioned workload — shows fair scheduler at scale
F) Fairness correctness: 3 tenants at weights 3:2:1Dispatch ratios must match weight ratios within ±10%

Key Assertion: Weighted Fairness Correctness

The benchmark enqueues 2000 jobs each for tenants a, b, c with weights 3, 2, 1. It then samples the first 3000 dispatches — the contested window where all three tenants have work to pick. The observed ratios must land within ±10% of the ideal:

  • Expected: a = 0.500, b = 0.333, c = 0.167
  • If any tenant's observed ratio deviates more than ±10% from its expected ratio, the bench exits 1

This validates the central claim of the package: paying customers at higher weights actually get proportionally more worker slots. If the scheduler drifts, this test catches it — not a property assertion from the README, but a measured runtime behavior.

Test Setup

  • Backend: FakeJobsService wrapping InMemoryBackend (single-process, no Redis)
  • Iterations: 10,000 per latency scenario
  • Warmup: 1000 iterations
  • Throughput jobs: 5000 per workload
  • Fairness jobs: 6000 (2000 per tenant)
  • Fairness tolerance: ±10%
  • Redis: Not required — bench isolates library overhead from transport I/O

Running Locally

bash
cd jobs
npm install
npm run bench
# Custom scale
npx ts-node bench/jobs.bench.ts --iterations 20000 --throughput-jobs 10000

Results

Measured on Windows 11, Node.js 24, Intel Core i7-10750H. Your results will vary.

Per-call latency

BenchmarkAvgP50P95P99
A) Baseline: direct async call0.3µs0.2µs0.4µs0.7µs
B) enqueue() — no drain2.0µs1.6µs2.6µs11.0µs
C) enqueue + drain, 1 job6.0µs3.4µs5.6µs17.2µs

Throughput

ScenarioTotal timeJobs/sec
D) 5000 jobs, 1 tenant82.9ms~60,000
E) 5000 jobs, 100 tenants21.1ms~237,000

Fairness correctness

TenantWeightExpected ratioObserved ratioDeviation
a30.5000.5000.0%
b20.3330.3330.0%
c10.1670.1670.2%

Worst deviation: 0.2% (tolerance: ±10%) — ✓ PASS

Derived numbers

  • Queue enqueue overhead (B − A): ~1.7µs
  • End-to-end overhead (C − A): ~5.7µs
  • E vs D: Partitioned workload is ~4× faster on total throughput than a single queue of the same size

Interpretation

Per-call overhead is low. enqueue() adds ~1.7µs over a direct async call — that's the cost of serializing context into the job payload, inserting into the backend, and updating scheduler state. End-to-end (enqueue + dispatch + handler invocation) lands at ~6µs P50, which makes the library feel instantaneous for single-job flows.

Single-queue throughput has a known ceiling. D's ~60K jobs/sec on a single-tenant 5000-job queue is limited by Array.shift() being O(N) — the waiting queue shrinks by one on every pick, and shifting a large array dominates the loop. This is a real characteristic of the current in-memory scheduler: bulk-enqueue workloads should partition across tenants or use the BullMQ backend (which uses Redis LPOP/BRPOP primitives under the hood).

Partitioned workloads are faster. E processes 5000 jobs across 100 tenant queues of ~50 each in 21ms (~237K jobs/sec). Small queues → cheap shift → fast drain. This is the shape most real SaaS workloads take (many tenants, bursty per-tenant activity), which is exactly what the fair scheduler is designed for.

Weighted fairness is measurably correct. The observed 3:2:1 dispatch ratios land within 0.2% of the ideal — the scheduler is implementing weighted round-robin faithfully, not just approximately. For production you can trust that a tenant with setTenantWeight(..., 3) gets 3× the worker slots of a tenant with weight 1 within a contested window.

Production Notes

  • In-memory numbers are the library floor — BullMQ adds Redis round-trips (~0.1-1ms per operation). For Redis-backed workloads, dominant cost shifts to network I/O.
  • In-memory fairness is process-local. If you run multiple replicas, each has its own scheduler — use BullMQ + stable tenant-to-queue mapping for cross-process fairness.
  • The Array.shift() O(N) issue in the in-memory backend matters for bulk-ingest workloads. For normal SaaS shapes (many tenants, moderate per-tenant throughput) it's not visible.

Released under the MIT License.