Benchmarks

Behavior-first benchmark presentation for a laptop-scale prototype.

The harness compares naive execution against shared execution. The numbers are intentionally modest and easy to inspect because the goal is to test the middleware thesis, not inflate hardware claims.

14 tasks requested
10 actual executions
4 executions saved
0.00 false-collapse rate

Scenario families

  • coding_repo_scan: overlapping repo-understanding tasks from concurrent coding branches
  • document_research: repeated evidence extraction over the same corpus
  • api_fanout: overlapping outbound API work
  • false_collapse_safety: similar-looking tasks that should remain separate

Current summary table

Scenario Tasks Execs Saved Dedup False-Collapse
coding_repo_scan4222.0x0.00
document_research3211.5x0.00
api_fanout3211.5x0.00
false_collapse_safety4401.0x0.00
Benchmark summary view showing scenario-level task counts, actual executions, saved executions, and dedup ratios across the current local harness.

Current benchmark preview from the local harness. The chart matches the summary table and stays deliberately modest about runtime claims.

Interpretation notes

A saved execution means fewer backend calls would have been issued under the same task stream. A zero false-collapse rate means the current hand-authored safety scenarios stayed separate. Latency is only a local proxy because executors are mocked or lightweight.