Benchmarks

Behavior-first benchmark presentation for a laptop-scale prototype.

The harness compares naive execution against shared execution. The numbers are intentionally modest and easy to inspect because the goal is to test the middleware thesis, not inflate hardware claims.

14 tasks requested

10 actual executions

4 executions saved

0.00 false-collapse rate

Scenario families

coding_repo_scan: overlapping repo-understanding tasks from concurrent coding branches
document_research: repeated evidence extraction over the same corpus
api_fanout: overlapping outbound API work
false_collapse_safety: similar-looking tasks that should remain separate

Current summary table

Scenario	Tasks	Execs	Saved	Dedup
coding_repo_scan	4	2	2	2.0x
document_research	3	2	1	1.5x
api_fanout	3	2	1	1.5x
false_collapse_safety	4	4	0	1.0x

Benchmark summary view showing scenario-level task counts, actual executions, saved executions, and dedup ratios across the current local harness.

Current benchmark preview from the local harness. The chart matches the summary table and stays deliberately modest about runtime claims.

Interpretation notes

A saved execution means fewer backend calls would have been issued under the same task stream. A zero false-collapse rate means the current hand-authored safety scenarios stayed separate. Latency is only a local proxy because executors are mocked or lightweight.

Behavior-first benchmark presentation for a laptop-scale prototype.

Scenario families

Current summary table

Interpretation notes

Further reading