Laptop-first local middleware

One execution. Many agents. Zero wasted work.

gemma4-wdc is a laptop-first middleware runtime that detects overlapping agent tasks and collapses them into a single shared execution before any backend work is duplicated.

Simulation-first 0.00 false-collapse rate Laptop-scale

Run Locally → View Repository

Animated walkthrough showing multiple agent tasks collapsing into a single shared execution unit with result fan-out and metrics updates.

The bottleneck

Parallel branches can independently ask for the same backend work.

Repo scans, code search, SQL queries, document extraction, and API calls often recur across nearby agent branches. On modest hardware, that duplicated backend work becomes a systems problem quickly.

The answer

Detect overlap early, hold briefly, execute once, fan out cleanly.

gemma4-wdc adds semantic matching, a bounded admission window, and shared execution units between agents and downstream tools so equivalent work happens once and only once.

Core runtime loop

Six steps from agent intent to result fan-out.

The runtime is intentionally simple and inspectable: normalize the work, hold compatible tasks briefly, execute once, then return the result to every attached branch.

01

Agent task generation

Analyst, planner, coder, reviewer, and research branches emit structured tool work independently.

02

Task ingress

Every task is registered with agent identity, branch lineage, task type, payload, and arrival time.

03

Semantic fingerprinting

Exact hashes and lightweight matching heuristics determine whether new work overlaps with a pending execution unit.

04

Admission window

The first compatible task opens a bounded, non-resetting window that captures matching subscribers without unbounded delay.

05

Shared execution unit

One SEU owns execution state, backend invocation, subscribers, and observability for the shared work item.

06

Result fan-out

A single completed backend result is fanned back to every attached branch while metrics record the saved work.

Architecture

One local runtime, many lightweight branches.

Simulation mode is the default. A central FastAPI service manages shared execution units, task matching, and observability while lightweight or simulated agents make the middleware value visible on a single machine.

Hybrid mode is optional and supports one real local model adapter without turning the project into a hardware arms race.

Explore architecture →

Architecture diagram showing agents, task ingress, fingerprinting, admission control, shared execution units, execution backends, and metrics.

Coding-agent overlap demo showing planner, coder, and reviewer branches converging on one shared repo-understanding execution unit.

Flagship demo

Coding-agent overlap is the clearest proof point.

Planner, coder, and reviewer branches ask overlapping questions about runtime state transitions, repo structure, and shared execution logic. gemma4-wdc collapses that repo-understanding work into one SEU while keeping unrelated code search separate.

It is a strong consumer-hardware example because the duplicate work is obvious, the payoff is visible, and the hardware assumptions stay honest.

See example walkthroughs →

Benchmarks

Preliminary, local, and intentionally easy to audit.

These are local-harness numbers from mock or lightweight executors. They show visible savings without pretending to prove cluster-scale throughput.

Tasks requested

14

Actual executions

10

Executions saved

4

False-collapse rate

0.00

Scenario table

Current benchmark breakdown

Methodology →

Scenario	Tasks	Execs	Saved	Dedup
coding_repo_scan	4	2	2	2.0x
document_research	3	2	1	1.5x
api_fanout	3	2	1	1.5x
false_collapse_safety	4	4	0	1.0x

Benchmark summary view showing tasks, executions, saved executions, and dedup ratios across the current local harness.

Try it in 2 minutes

Start the local runtime with three commands.

Open the repo →

            cd runtime/shared_execution/backend
pip install -r requirements.txt
uvicorn app.main:app --reload