AI Infrastructure · Disaggregated Inference · Experimental Results

Seam Orchestrator: Workload-Aware KV Routing

Experiments in admissibility, capacity-aware routing, hysteresis, and replacement-feasible glue layers above transport — for disaggregated inference systems where the policy question matters as much as the transport question.

By Manish KL April 2026 ~15 min read Technical Essay
Abstract

This project is explicitly not a transport replacement war. It sits over mock, NIXL, and UCX-style backends and asks a different question: not only whether bytes can move, but whether a path should be used for a specific workload right now. The result is a control plane that tracks non-binary PathState, computes GFS, PRS, and FAE, incorporates workload profiles and capacity state, and emits decision records explaining why a candidate path was chosen, skipped, or rejected.

92%
strict workloads preserved on healthy paths
78%
tolerant degraded-path utilization
88%
healthy headroom preservation
31
oscillations avoided by hysteresis

Why this matters now

Disaggregated inference changes the operating surface of the stack. Once prefill and decode live on different systems, the software problem is no longer only "move KV cache quickly." It becomes "decide which candidate path is appropriate for which workload under current conditions." That is why transport alone does not answer the whole systems question.

Seam Orchestrator turns that seam into an explicit control plane. It tracks non-binary PathState, computes GFS, PRS, and FAE scores, incorporates workload profiles and capacity state, and emits structured decision records explaining every routing outcome.

The interesting software layer in disaggregated inference is not only the one that can move bytes. It is the one that can decide whether a still-alive path deserves trust for a given workload.

The architecture: policy above transport

The cleanest thing about this design is the boundary it draws. Backends move bytes. The orchestrator makes judgments. That distinction is not just tidy — it is strategically important. It means backend choice can vary while the policy layer remains coherent.

Seam Orchestrator: Architecture backends move bytes · orchestrator makes judgments PUBLIC-STYLE STACK (NIXL) transport-oriented path Trainium / Prefill Node KV Producer NIXL KV Transfer Layer descriptors · memory reg · backend abstraction EFA / libfabric Transport hardware network path Host-side Receive decode-side handoff Decode System consumer SEAM ORCHESTRATOR PATH policy + control above transport Prefill Session / KV Producer same producer, different control path Seam Orchestrator PathState · GFS · PRS · FAE workload-aware admissibility capacity-aware routing hysteresis · decision records ← main contribution Mock fault inject NIXL compatible UCX swappable Decode Pool / Consumer same path: admissible for BATCH / inadmissible for STRICT vs transport-oriented → moves bytes policy-oriented → decides admissibility
Figure 1. Seam Orchestrator as a policy/control layer above swappable backends. The key distinction: the backend path is replaceable; the higher-leverage layer is the orchestration policy above it.

This design matters for two reasons. First, it keeps the project honest: it does not pretend to be a full NIXL clone or a transport benchmark. Second, it reveals where the software leverage may actually live. Once backend movement exists at all, the harder and more strategic question is whether a path is admissible for a given workload under current latency, jitter, capacity, and propagation conditions.

Key terminology: pool = decode resource group · path = route to that pool · candidate = pool + path + current state snapshot · decision record = structured explanation of the routing outcome

Can the glue layer be replaced?

In prototype form, yes. That is one of the most useful things the project shows. The uploaded package supports swappable backends and includes a replacement-path note that carefully distinguishes prototype feasibility from production parity.

That distinction is important. Demonstrating a replacement-capable path is not the same as claiming equivalence to hardened production stacks. But the narrow claim is already strategically meaningful: the glue layer is not magic, swappable backends are plausible, and therefore the enduring control point may shift upward into interface ownership and policy logic above transport.

Scenario E: the category-defining proof

Scenario E remains the strongest single artifact in the repository because it demonstrates the whole thesis in a compact table. Same path. Same transfer mechanism. Different admissibility by workload.

Scenario E: Same Path, Different Admissibility path state: DEGRADED_USABLE · same transfer mechanism for all workloads WORKLOAD SLA JITTER TOL. ADMITTED? REASON BATCH 200ms 0.9 YES admitted despite DEGRADED_USABLE INTERACTIVE 30ms 0.3 NO jitter budget exceeded RELEASE 15ms 0.1 NO p99 latency above SLA threshold ASYNC-BATCH 100ms 0.9 YES admitted despite degradation STRICT-SYNC 20ms 0.2 NO jitter intolerance on degraded path TOLERANT-MED 50ms 0.7 YES acceptable under current degraded state
Figure 2. Scenario E — same path, DEGRADED_USABLE state, different admissibility per workload. This is the category statement: transport cannot answer this question alone.

This is what turns the repo from a scoring toy into a category statement. If every alive path were equally appropriate for every workload, then transport would dominate and the rest of the control plane would be ornamental. Scenario E proves the opposite.

Scenario F: health vs headroom

Scenario F extends the thesis under capacity pressure. The healthiest path is not always the best path if healthy headroom should be preserved for stricter workloads. That is where Seam Orchestrator stops behaving like a path-quality ranker and starts behaving like a real policy layer.

Scenario F: Health vs Headroom healthy headroom preserved for strict workloads · tolerant traffic redirected to degraded-but-usable path Strict Traffic · Healthy Path Near Saturation healthy path 87% full degraded-usable path 30% — slack available POLICY: preserve healthy for strict don't fill remaining 13% with tolerant work Tolerant Traffic · Same Capacity State healthy path reserved ✓ degraded-usable path tolerant work shifted here POLICY: shift tolerant to degraded path acceptable degradation for tolerant class
Figure 3. Scenario F — capacity-aware headroom preservation. The system deliberately routes tolerant work away from the healthy path to keep headroom for strict traffic.

Experiments: from demos to evidence

The uploaded package includes more than hand-built scenarios — it carries experiment summaries, evaluation artifacts, and baseline comparisons. The experiment families test admissibility boundaries, capacity-pressure tradeoffs, hysteresis stability, and alternate-path scarcity.

Admissibility Boundary Sweep admissibility rate vs path degradation severity 100% 75% 50% 25% 0% low degradation → high batch/tolerant interactive release-strict Capacity-Pressure Tradeoff routing distribution as healthy-path occupancy rises 100% 50% 0% 0% healthy-path occupancy → 100% tolerant traffic shifts healthy headroom preserved strict on healthy tolerant shifted admissibility peels away by workload class — not all at once policy balances across health, headroom, workload strictness
Figure 4. Left: admissibility boundary sweep — strict workloads lose trust faster as degradation increases. Right: capacity-pressure tradeoff — tolerant work shifts to degraded paths as healthy occupancy rises.

Hysteresis stability

Hysteresis is not cosmetic. A gray-failure controller that flaps is not operationally believable. The staged restore model avoids immediate snap-back after a single clean sample, instead requiring cleaner windows before returning to healthy. A control plane that escalates fast and restores slowly under noise is much easier to trust than one that rapidly oscillates.

What the baselines show

The baseline comparison answers the inevitable skepticism: why not just pick the lowest-latency path, the healthiest path, or the least occupied path? Because each naive baseline throws away a different part of the control problem.

BaselineWhat it preservesWhat it loses
lowest latencyraw fast path biascontrolled use of degraded-but-usable capacity
binary health onlysimple exclusion logicworkload-relative admissibility nuance
capacity onlyheadroom awarenessstrict-workload protection under degraded conditions
Seam Orchestratorpolicy balance: health + headroom + workload strictness + explainabilitymore complexity — in service of a richer control objective

What this means

The narrow conclusion is that a replacement-capable prototype path exists for the glue layer. The broader conclusion is that once backend choice becomes abstractable, the more durable software leverage may move upward into interface ownership and policy logic.

In practical terms, the strategic moat in disaggregated inference may no longer be only "who owns the backend that moves bytes?" It may increasingly be "who owns the control plane that decides which paths deserve trust, for which workloads, and under what capacity conditions?"

The glue layer is replaceable in principle. The higher-leverage software question is who owns the policy layer that decides which paths deserve trust.

Limits and next steps

This is not claiming production parity with hardened stacks. It is claiming something narrower and credible: replacement feasibility, backend swappability, and a useful orchestration layer above transport.

The next clean extensions are obvious: broader embodiments beyond KV routing, richer replay tooling, more topology-aware experiments, and stronger presentation artifacts. But even in its current form, this is already more than a concept. It is a compact systems artifact for a real category.