The Real AI Bottleneck Is Moving From Compute to Interconnect Power Density

1. Why FLOPS stopped being enough

The AI industry spent its first wave talking as if the main problem were arithmetic throughput. That made sense when the question was whether accelerators could do enough matrix math to keep frontier models training at all. But once clusters became large, multi-tiered, and communication-heavy, the bottleneck started moving outward.

The modern AI system is no longer one chip solving one problem. It is a choreography problem across accelerators, high-radix switches, retimers, optical modules, NICs, host memory, local storage, and remote racks. The system’s effective performance is increasingly controlled by how expensive it is to move activations, gradients, checkpoints, expert traffic, optimizer state, and inference context across the fabric.

A cluster does not get value from raw bandwidth alone. It gets value from delivered bandwidth that arrives at the right time, at tolerable energy cost, without collapsing thermals or operational simplicity.

2. What interconnect power density really means

“Interconnect power density” is not just watts per module. It is the compounded cost of pushing more and more communication through a fixed amount of electrical escape, board area, package margin, cooling headroom, and rack envelope.

Compute-centric thinking

Add more accelerators
Increase per-chip throughput
Assume the network will scale behind it

Fabric-centric reality

Every extra bit moved consumes power twice: once logically, once thermally
Signal conditioning, retimers, DSPs, and packaging all accumulate
Cooling, serviceability, and layout become first-order architectural constraints

3. A practical power-budget decomposition

The easiest way to make this concrete is to stop thinking of “network power” as one line item. In real AI systems, the transport path often includes several distinct budgets that each rise under scale pressure.

Interconnect cost is a stack, not a single number

Budget component	What it does	Why it grows	Architectural consequence
SerDes / PHY	Drives and receives high-speed electrical lanes	Higher lane rates and denser escape stress equalization and signal integrity	Package edges and local board design become harder
Retimers / signal conditioning	Recover margin on difficult electrical paths	Longer or noisier traces need more help	Extra power and extra heat for mere transport
DSP / optical module logic	Supports optical signaling and recovery	Longer reach and more complex modulation raise burden	Module-level watts become a systems concern
Switch silicon	Moves traffic through high-radix fabrics	More ports and more concurrent flows raise radix and scheduling pressure	Fabric design becomes topology-sensitive
Cooling overhead	Removes heat generated by all of the above	Heat is concentrated in dense zones, not spread evenly	Transport watts cascade into cooling watts

This is why bits moved and useful compute delivered diverge. The machine pays several times for the same tensor movement.

4. The hidden topology tax of collectives

AI clusters do not move traffic uniformly. They produce synchronized bursts: all-reduce, all-gather, expert dispatch, parameter synchronization, checkpoint writes, and remote state fetches. Those patterns are topology-sensitive. A design that looks fine under averaged throughput can still perform badly if collective phases line up with the wrong fabric shape.

Collective phase

Many devices communicate in structured bursts, not random flows.

→

Topology stress

Hot links, oversubscribed stages, and switch contention appear unevenly.

→

Power and heat spike

Transport components light up precisely when the workload is most synchronized.

→

Effective compute loss

Accelerators stall waiting on movement rather than arithmetic.

5. Order-of-magnitude energy view

Precise numbers vary by implementation, distance, packaging, modulation, and whether the path is package-level, board-level, rack-level, or row-level. The important point is directional: short electrical links can be efficient at tiny distances, but the energy and thermal cost of longer, denser, higher-speed electrical movement rises quickly enough that optics becomes attractive not just for bandwidth, but for power-density relief.

Illustrative energy-per-bit intuition

Path type	Typical context	Directionally plausible pJ/bit range	System implication
Very short electrical	Package / board-adjacent	Low single digits to low tens	Still attractive when reach is tiny and packaging can tolerate it
Longer electrical with heavy conditioning	Board-to-board / dense rack escape	Tens and rising	Retimers, equalization, and thermals dominate the story
Optical for longer reach	Rack-to-rack / scale-out / reconfigurable paths	Often more favorable at system level than equivalent long electrical	The win is not just speed; it is lower power-density pain at useful reach

This is a systems chart, not a vendor claim. The point is architectural intuition, not fake precision.

6. Board vs rack vs row regimes

Not all interconnect is the same problem. One reason discussions get muddy is that people collapse very different transport regimes into one word: network.

Board / package regime

Main problem: electrical escape, signal integrity, local heat
Main tools: short electrical, near-package optics, CPO, NPO
Main question: how close can optics move toward the silicon?

Rack / row regime

Main problem: cable density, reach, switch stages, topology oversubscription
Main tools: pluggables, OCS, VCSEL arrays, rack-scale optics
Main question: how much machine coherence can the fabric preserve?

7. Interconnect and the new memory wall

It would be a mistake to treat this essay as replacing the memory wall with a separate network story. The more interesting truth is that the two are converging.

In modern AI systems, memory hierarchy increasingly extends beyond local HBM. Once you start thinking in terms of pooled memory, remote memory, checkpoint tiers, model-state disaggregation, or distributed shared-memory semantics, the cost of movement across the fabric becomes part of the memory problem itself.

If moving data to another rack is too hot, too power-hungry, or too topology-sensitive, then the cluster cannot honestly treat that remote state as cheap logical memory. Interconnect power density becomes a limit on how large a machine can behave like one coherent memory entity.

8. Where optical switching actually helps

Optical circuit switching is attractive because it changes the shape of the problem: instead of permanently paying for one static topology, the system can reconfigure fabric paths around communication phases.

Good fit for OCS

Epoch-like communication phases
Large synchronized tensor exchanges
Workloads whose path demand can be forecast

Weak fit for OCS

Highly random fine-grained traffic
Flows whose duration is shorter than reconfiguration benefit
Topologies where static overprovisioning is cheaper than control complexity

9. The scheduler must understand fabric

If the first post argued that the AI cluster operating system must understand light, this post explains why: because the network is becoming too expensive, thermally and electrically, to remain invisible to higher-level scheduling policy.

A serious future scheduler will not just place jobs on GPUs. It will reason about communication classes: which transfers are latency-critical, which ones are bursty but deferrable, which paths deserve optical reservation, and which flows can be degraded, delayed, compressed, or rerouted.

10. What winners will optimize for

Power per useful transferred bit, not just theoretical link efficiency
Topology-aware scheduling, where fabric constraints shape placement and execution
Transport-class awareness, distinguishing collectives, checkpointing, remote memory, and control traffic
Serviceable optical architectures, especially where external laser sources preserve operability while deeper optical integration attacks the power-density problem
Holistic cluster economics, where the network is judged by delivered model throughput, not isolated component metrics

Compute still matters. But in modern AI systems, the decisive question is increasingly whether the cluster can afford to move the data that compute generates.

Home · Back to Writings