Seam Orchestrator: Workload-Aware KV Routing in Disaggregated Inference

Abstract

This project is explicitly not a transport replacement war. It sits over mock, NIXL, and UCX-style backends and asks a different question: not only whether bytes can move, but whether a path should be used for a specific workload right now. The result is a control plane that tracks non-binary PathState, computes GFS, PRS, and FAE, incorporates workload profiles and capacity state, and emits decision records explaining why a candidate path was chosen, skipped, or rejected.

Why this matters now

Disaggregated inference changes the operating surface of the stack. Once prefill and decode live on different systems, the software problem is no longer only "move KV cache quickly." It becomes "decide which candidate path is appropriate for which workload under current conditions." That is why transport alone does not answer the whole systems question.

Seam Orchestrator turns that seam into an explicit control plane. It tracks non-binary PathState, computes GFS, PRS, and FAE scores, incorporates workload profiles and capacity state, and emits structured decision records explaining every routing outcome.

The interesting software layer in disaggregated inference is not only the one that can move bytes. It is the one that can decide whether a still-alive path deserves trust for a given workload.

The architecture: policy above transport

The cleanest thing about this design is the boundary it draws. Backends move bytes. The orchestrator makes judgments. That distinction is not just tidy — it is strategically important. It means backend choice can vary while the policy layer remains coherent.

Figure 1. Seam Orchestrator as a policy/control layer above swappable backends. The key distinction: the backend path is replaceable; the higher-leverage layer is the orchestration policy above it.

This design matters for two reasons. First, it keeps the project honest: it does not pretend to be a full NIXL clone or a transport benchmark. Second, it reveals where the software leverage may actually live. Once backend movement exists at all, the harder and more strategic question is whether a path is admissible for a given workload under current latency, jitter, capacity, and propagation conditions.

Key terminology: pool = decode resource group · path = route to that pool · candidate = pool + path + current state snapshot · decision record = structured explanation of the routing outcome

Can the glue layer be replaced?

In prototype form, yes. That is one of the most useful things the project shows. The uploaded package supports swappable backends and includes a replacement-path note that carefully distinguishes prototype feasibility from production parity.

That distinction is important. Demonstrating a replacement-capable path is not the same as claiming equivalence to hardened production stacks. But the narrow claim is already strategically meaningful: the glue layer is not magic, swappable backends are plausible, and therefore the enduring control point may shift upward into interface ownership and policy logic above transport.

Scenario E: the category-defining proof

Scenario E remains the strongest single artifact in the repository because it demonstrates the whole thesis in a compact table. Same path. Same transfer mechanism. Different admissibility by workload.

Figure 2. Scenario E — same path, DEGRADED_USABLE state, different admissibility per workload. This is the category statement: transport cannot answer this question alone.

This is what turns the repo from a scoring toy into a category statement. If every alive path were equally appropriate for every workload, then transport would dominate and the rest of the control plane would be ornamental. Scenario E proves the opposite.

Scenario F: health vs headroom

Scenario F extends the thesis under capacity pressure. The healthiest path is not always the best path if healthy headroom should be preserved for stricter workloads. That is where Seam Orchestrator stops behaving like a path-quality ranker and starts behaving like a real policy layer.

Figure 3. Scenario F — capacity-aware headroom preservation. The system deliberately routes tolerant work away from the healthy path to keep headroom for strict traffic.

Experiments: from demos to evidence

The uploaded package includes more than hand-built scenarios — it carries experiment summaries, evaluation artifacts, and baseline comparisons. The experiment families test admissibility boundaries, capacity-pressure tradeoffs, hysteresis stability, and alternate-path scarcity.

Figure 4. Left: admissibility boundary sweep — strict workloads lose trust faster as degradation increases. Right: capacity-pressure tradeoff — tolerant work shifts to degraded paths as healthy occupancy rises.

Hysteresis stability

Hysteresis is not cosmetic. A gray-failure controller that flaps is not operationally believable. The staged restore model avoids immediate snap-back after a single clean sample, instead requiring cleaner windows before returning to healthy. A control plane that escalates fast and restores slowly under noise is much easier to trust than one that rapidly oscillates.

What the baselines show

The baseline comparison answers the inevitable skepticism: why not just pick the lowest-latency path, the healthiest path, or the least occupied path? Because each naive baseline throws away a different part of the control problem.

Baseline	What it preserves	What it loses
lowest latency	raw fast path bias	controlled use of degraded-but-usable capacity
binary health only	simple exclusion logic	workload-relative admissibility nuance
capacity only	headroom awareness	strict-workload protection under degraded conditions
Seam Orchestrator	policy balance: health + headroom + workload strictness + explainability	more complexity — in service of a richer control objective

What this means

The narrow conclusion is that a replacement-capable prototype path exists for the glue layer. The broader conclusion is that once backend choice becomes abstractable, the more durable software leverage may move upward into interface ownership and policy logic.

In practical terms, the strategic moat in disaggregated inference may no longer be only "who owns the backend that moves bytes?" It may increasingly be "who owns the control plane that decides which paths deserve trust, for which workloads, and under what capacity conditions?"

The glue layer is replaceable in principle. The higher-leverage software question is who owns the policy layer that decides which paths deserve trust.

Limits and next steps

This is not claiming production parity with hardened stacks. It is claiming something narrower and credible: replacement feasibility, backend swappability, and a useful orchestration layer above transport.

The next clean extensions are obvious: broader embodiments beyond KV routing, richer replay tooling, more topology-aware experiments, and stronger presentation artifacts. But even in its current form, this is already more than a concept. It is a compact systems artifact for a real category.

seam-orchestrator-workload-aware-kv-routing-v2.html · Revised April 2026 · ← All writings