Scale-Out Was Yesterday. Scale-Up Optics Is the Next Battle

1. The old scale-out mindset

Classical optics narratives were built around distance and throughput: longer reaches, higher speeds, cleaner transport between independent boxes. That model still matters. But AI clusters have changed the meaning of networking.

Once a training or inference system spans many accelerators that are expected to behave like one tightly coordinated machine, the network stops being a peripheral transport fabric and starts becoming part of the machine’s internal nervous system.

The most strategically important optical link in AI may no longer be the one that goes farthest. It may be the one that makes a rack-scale machine behave like a single coherent system.

2. Why scale-up is taking over

Scale-up is taking over because AI workloads are intolerant of wasted movement inside tightly coupled domains. Gradient exchange, expert traffic, remote memory access, checkpoint coordination, and multi-chip execution all punish latency and fabric inefficiency much more harshly than older, looser distributed workloads did.

Scale-out priority

More aggregate bandwidth across a broader fabric.
Longer reach and large switching domains.
Optics as high-capacity inter-box plumbing.

Scale-up priority

Deterministic low-latency behavior inside a tightly coupled compute domain.
Higher packaging and power-density sensitivity.
Optics as an enabler of local machine coherence, not just remote transport.

That is exactly why current vendor demos are interesting. Lumentum’s OFC 2026 VCSEL demonstration is explicitly aimed at rack-level scale-up architectures using “slow and wide” protocols such as UCIe and PCIe, and Marvell plus Lumentum are showing optical circuit switching as a next-generation AI scale-up fabric tool. citeturn223394search3turn598745search6

3. What scale-up optics actually means

Scale-up optics does not simply mean “the same module, but closer.” It means optics being used to solve a different kind of problem.

Old framing

Move more bits between separate systems.

→

New framing

Preserve low-latency, high-bandwidth behavior inside one logical machine.

→

Design implication

Packaging, lane strategy, topology, and switch behavior matter more.

→

Operational result

Optics becomes part of system architecture, not just interconnect procurement.

That is why scale-up attracts technologies that might have looked niche in a purely distance-centric world: VCSEL arrays, OCS, near-package optics, co-packaged optics, and even optical paths that are explicitly designed around package and rack topology rather than metro-style reach.

4. The technologies that matter

What becomes more important in a scale-up world

Technology	Why it matters for scale-up	What it changes
VCSEL arrays	Good fit for short-reach, slow-and-wide, rack-level links	Make dense local optical fabrics more practical
Optical circuit switching	Lets the fabric become reconfigurable and more topology-aware	Moves optics from static links toward dynamic machine composition
CPO / NPO	Reduce electrical escape burden and power density inside local domains	Push optics inward toward the heart of the machine
ELS architectures	Preserve serviceability while deeper optical integration increases	Balance efficiency with operability

Coherent’s OFC 2026 portfolio and CPO announcements reinforce this point too: the company is not pitching one monolithic optical future, but multiple CPO approaches spanning silicon photonics, VCSEL, and InP-on-silicon, explicitly across both scale-out and scale-up scenarios. citeturn598745search0turn223394search7

5. Reliability tax inside the machine

There is a cost to moving optics inward: the blast radius of failure grows. In a loose scale-out network, a bad link is often an infrastructure event. In a tightly coupled 256-GPU or 512-GPU scale-up domain, a bad local optical component can become a machine-coherence event.

Old scale-out failure model

A failed link is often absorbed by routing around it.
The affected domain may be large in reach but loose in coupling.
Performance loss matters, but coherence is less fragile.

Scale-up failure model

A failed internal optical path can impair one tightly coupled machine, not just one route.
Latency asymmetry and degraded bandwidth can break collective efficiency quickly.
Redundancy, de-rating, and stronger error handling become architectural requirements.

This is why the scale-up future is not only about faster optics. It is also about making those optics software-visible, degradable, and survivable. If optics becomes part of the machine’s internal nervous system, then fault isolation, redundancy, and correction have to evolve with it.

6. Why software has to care

Scale-up optics is where your earlier “optics is an OS problem” thesis comes back with force. Inside a scale-up domain, software cannot afford to treat the fabric as a black box. The scheduler needs to understand which optical paths are local, which are degraded, which are power-expensive, and which topologies support the communication phase the workload is about to enter. This matters most for collective-heavy phases such as All-Reduce and All-to-All, where a few bad internal paths can create disproportionate tail-latency penalties across the whole machine.

This is also where optics starts to blur into memory semantics. In a 100kW-plus rack world, the winning local fabric is not merely the one with the lowest latency; it is the one with the best joules-per-bit at the topology points the workload hits hardest. Once scale-up fabrics help determine whether a cluster behaves like one large logical machine, the cost of optical movement becomes part of the cost of remote memory, collective execution, and even model partitioning.

Scale-up optics matters because it changes the answer to a deeper question: how large a machine can behave like one machine.

7. 2027–2028 prediction: CPO first, OCS next

The likely winner in the 2027–2028 cycle is not “CPO or OCS” in the abstract. It is a layered scale-up stack where different technologies win at different distances and duties.

A practical prediction by role

Layer	Likely near-term winner	Why
Innermost fixed, always-hot links near high-radix silicon	CPO / near-package optics	Best aligned with electrical escape relief, local power density, and fixed high-duty-cycle paths
Rack / pod-scale dynamic composition layer	OCS	Best aligned with topology reconfiguration, workload-phase adaptation, and flexible machine composition
Short-reach, slow-and-wide local fabrics	VCSEL-rich scale-up paths	Compelling where lane parallelism and dense local optical movement matter more than long reach

So my forecast is simple: CPO and near-package optics likely win first where the problem is immediate and brutally physical — getting out of the package and across the board without drowning in electrical loss and heat. OCS becomes more important next where the problem is not local escape but dynamic rack or pod composition.

The next cycle probably belongs not to “CPO versus OCS,” but to a layered scale-up fabric where CPO handles local coherence and OCS handles dynamic composition.

8. Where the next battle will be fought

The next battle will not only be over who ships the next transceiver generation. It will be over who defines the local optical fabric of the AI machine. And in practice that means solving not just for bandwidth and latency, but for joules per bit under real rack-scale power constraints.

Who wins the rack-scale optical topology?
Who controls the tradeoff between electrical proximity and optical flexibility?
Who makes scale-up fabrics software-visible enough to become schedulable resources?
Who solves the packaging, thermal, and serviceability problems well enough that local optics becomes operationally normal?

That is why “scale-out was yesterday” should not be read literally. Scale-out is still huge. The better point is that the most underappreciated optical opportunity has moved inward, toward scale-up, where AI infrastructure looks least like a conventional network and most like one giant machine trying to stay coherent under pressure.

The future of AI optics is not just about connecting datacenters. It is about teaching a rack, a row, and eventually a cluster to behave like a single system.

Series note: after building the case for optics as an OS problem, a power-density problem, a failure-domain problem, a materials problem, and a market problem, this essay argues that the next real optical contest is moving toward scale-up fabrics inside the AI machine itself.

Home · Back to Writings