Benchmarks
Behavior-first benchmark presentation for a laptop-scale prototype.
The harness compares naive execution against shared execution. The numbers are intentionally modest and easy to inspect because the goal is to test the middleware thesis, not inflate hardware claims.
Scenario families
- coding_repo_scan: overlapping repo-understanding tasks from concurrent coding branches
- document_research: repeated evidence extraction over the same corpus
- api_fanout: overlapping outbound API work
- false_collapse_safety: similar-looking tasks that should remain separate
Current summary table
| Scenario | Tasks | Execs | Saved | Dedup | False-Collapse |
|---|---|---|---|---|---|
| coding_repo_scan | 4 | 2 | 2 | 2.0x | 0.00 |
| document_research | 3 | 2 | 1 | 1.5x | 0.00 |
| api_fanout | 3 | 2 | 1 | 1.5x | 0.00 |
| false_collapse_safety | 4 | 4 | 0 | 1.0x | 0.00 |
Current benchmark preview from the local harness. The chart matches the summary table and stays deliberately modest about runtime claims.
Interpretation notes
A saved execution means fewer backend calls would have been issued under the same task stream. A zero false-collapse rate means the current hand-authored safety scenarios stayed separate. Latency is only a local proxy because executors are mocked or lightweight.