AI Infrastructure Thermal Systems Blackwell Direct Liquid Cooling GB300 NVL72

The Cooling Stack Is the New Critical Path: How Blackwell GB300 NVL72 Racks Manage 142 kW

Cooling is no longer a facilities afterthought that gets sorted out after the GPUs arrive. For the Blackwell GB300 NVL72, the thermal system is a precision five-layer architecture with specific component vendors, defined thermal budgets at every interface, and failure modes that propagate directly into training throughput and inference latency. This is how it actually works — from the cold plate on the die to the chiller outside the building.

Manish KL · April 2026 · ~18 min read · Systems Essay
Contents
  1. Why cooling is now a systems problem
  2. The power budget: 142 kW and what generates it
  3. The five-layer cooling architecture
  4. Layer 1 — Cold plates: die-to-liquid interface
  5. Layer 2 — Inner tray manifolds and quick disconnects
  6. Layer 3 — Rack manifold and the UQD system
  7. Layer 4 — Coolant distribution units
  8. Layer 5 — Facility cooling and heat rejection
  9. Hybrid cooling: what stays air-cooled and why
  10. The full vendor map
  11. Failure modes in the thermal chain
  12. What changes with Vera Rubin

Why cooling is now a systems problem

There is a tendency to think of data center cooling as the infrastructure layer that precedes compute — the pipes and pumps and chillers that the facilities team installs before the GPU racks arrive. That mental model was defensible when racks drew 20 kW and could be managed with in-row air cooling. It stopped being defensible with Hopper. It became actively incorrect with Blackwell.

The Blackwell B200 GPU draws up to 1,000 W per chip under sustained load. A GB300 NVL72 rack houses 72 of them across 18 compute trays alongside 36 Grace CPUs, 9 NVLink switch trays, and all their associated networking and power delivery hardware. The aggregate peak draw is approximately 142 kW — in a single rack that occupies the same floor footprint as a conventional 42U server rack. That is seven times the power density of a typical 2022 enterprise GPU server and roughly 200 times the power density of an office building.

At 142 kW in a fixed volume, the thermal challenge is not simply "add more cooling." It is a precision engineering problem. Heat cannot move fast enough through air to keep that many GPU die junctions within their operating temperature windows. The only viable approach is direct liquid cooling — moving coolant physically to the surface of each chip, absorbing heat into the liquid, and carrying it away through a structured plumbing hierarchy. The five layers of that hierarchy are what this essay maps.

When a rack draws 142 kW and a single HBM stack throttles at 95°C, the cooling system is not support infrastructure. It is directly in the critical path of token throughput.

The power budget: 142 kW and what generates it

Before tracing the cooling stack, it is worth being precise about where the heat originates and in what proportions. The GB300 NVL72 is not a single homogeneous heat source — it is a structured collection of components with very different power densities, different cooling requirements, and different consequences for the application if they throttle.

ComponentCount per rackPower per unitCooling methodTrip consequence
Blackwell Ultra B300 GPU72~1,000 W TDPDirect liquid (cold plate)MEMCLK reduction, SM throttle, latency spike
Grace ARM CPU36~300 W TDPDirect liquid (cold plate)CPU frequency drop, NVLink-C2C bandwidth loss
NVLink 5 Switch (NVSwitch)9 trays~800–1,000 W per trayDirect liquidAll-reduce stall, collective BW degradation
OSFP optical modules~648 (36 per tray)3–5 W eachAir (residual airflow)Link error rate rise, retransmission
E1.S NVMe drivesUp to 8 per tray5–7 W eachAirThermal throttle, I/O latency
Power shelves (PSU + conversion)6–8~11 kW input each (at ~94% eff.)Air / partial liquidVRM degradation, supply ripple

The critical insight from this table is that the liquid-cooled components — GPUs, CPUs, and NVSwitches — generate over 90% of the total rack heat load. The air-cooled components are a distinct minority. This is why the GB300 NVL72 is described as a hybrid cooling system: it uses liquid cooling for the high-density compute core and retains residual air flow for the lower-density peripherals. That residual airflow must still be managed — but it is not the limiting constraint.

The 142 kW number is an aggregate rack-level peak, not a sustained operational number. Sustained power under real AI workloads typically runs 80–90% of TDP peak, meaning practical rack-level sustained power is in the 115–128 kW range. But the cooling system must be sized for the peak, because prefill bursts routinely approach TDP for the entire GPU complement simultaneously.

The five-layer cooling architecture

The thermal path in a GB300 NVL72 deployment can be understood as five sequential layers, each performing a specific heat transfer function and each with its own set of vendors, failure modes, and performance constraints. Heat enters at Layer 1 and exits the building at Layer 5. Every layer introduces thermal resistance that reduces the available temperature budget for the layer above it.

Five-layer DLC cooling architecture for Blackwell GB300 NVL72 Diagram showing heat flowing from GPU die through cold plate (Layer 1), to inner tray manifold with quick disconnects (Layer 2), to rack manifold (Layer 3), to coolant distribution unit CDU (Layer 4), to facility cooling and chiller plant (Layer 5). Each layer shows the key vendors and typical temperatures at the inlet/outlet interfaces. LAYER 1 Cold plate — die-to-liquid interface T_liquid_out ≈ 45–55°C Boyd · Asia Vital · Cooler Master LAYER 2 Inner tray manifold + quick disconnects supply ≈ 30–35°C in Boyd · Cooler Master · Auras · Parker · Stäubli LAYER 3 Rack manifold — UQD08 inter-tray distribution 450 QDs per NVL72 rack Danfoss · CPC · Stäubli · LOTES · Parker LAYER 4 Coolant distribution unit (CDU) L2L: facility water in ≈ 20–28°C Vertiv · CoolIT · Motivair (SE) · nVent · Boyd LAYER 5 Facility cooling: chiller / dry cooler / cooling tower ambient reject temperature Vertiv (AFC/CW) · Schneider · Modine · Carrier heat flows out → GPU TJ < 90°C ΔT ≈ 10–20°C ΔT ≈ 3–5°C ΔT ≈ 5–8°C reject to ambient cool supply ↑
Figure 1. The five-layer DLC thermal chain in a GB300 NVL72 deployment. Heat enters at the GPU die (top) and exits the building at the facility cooling plant (bottom). Each layer performs a distinct heat transfer function and introduces a ΔT that reduces the available temperature budget for layers above it. Cool facility water enters at Layer 4 and is conditioned by the CDU before flowing up through the rack manifold and into the per-tray plumbing.

Layer 1 — Cold plates: the die-to-liquid interface

The cold plate is the device that makes direct liquid cooling physically possible. It is a precision-machined metal block — typically brazed copper with internal microchannels — that clamps directly to the surface of the GPU die or CPU package, with a thin thermal interface material (TIM) filling the microscopic gap between the plate and the package lid. Coolant flows through the microchannels, absorbs heat conducted from the die, and exits the plate carrying that heat toward the rest of the system.

The engineering challenge at this layer is minimizing thermal resistance while maintaining mechanical reliability across thermal cycles. A GPU package in full operation and then idle will expand and contract as its temperature changes. The cold plate, the TIM, and the mounting hardware must accommodate that cycling without creating micro-cracks in solder joints or gaps in the TIM layer — either of which would increase thermal resistance and eventually cause throttling.

For the GB300 NVL72, cold plates are required for 72 GPU packages and 36 CPU packages per rack — 108 precision assemblies that must each maintain low thermal resistance for the platform's full lifetime. The two primary approaches used in this generation are:

The TIM between cold plate and package lid is as important as the plate geometry. State-of-the-art TIMs for this application are typically indium-based metallic thermal interface materials or high-performance polymer-based compounds with thermal conductivity in the 8–15 W/m·K range. The interface resistance budget is tight: for a 1,000 W GPU package with a cold plate contact area of roughly 60 cm², even a 0.05 K·cm²/W interface resistance translates to a 1°C temperature rise — which at the margins of the HBM thermal budget matters.

Why cold plate geometry has become a competitive differentiator: The Blackwell B300 package is physically larger than B200 and has a different thermal mass distribution, with HBM3e stacks concentrated in specific quadrants of the package lid. A cold plate designed with uniform microchannel spacing will have different thermal performance than one tuned to the specific hotspot map of the die. Boyd's per-platform customization approach — different designs for GB200 vs. GB300 — reflects this reality.

Layer 2 — Inner tray manifolds and quick disconnects

A single compute tray in the GB300 NVL72 houses four Grace-Blackwell Superchip packages — each containing two B300 GPUs and one Grace CPU — along with their cold plates, local plumbing, and power delivery hardware. The inner tray manifold is the plumbing network that distributes coolant from the rack's internal supply line to each cold plate on the tray, and collects heated coolant back from those plates to return to the rack manifold.

The key engineering element at this layer is the quick disconnect (QD) fitting — the mechanical coupling that allows a compute tray to be inserted into or removed from the rack without tools, without manually disconnecting any hoses, and ideally without spilling or introducing any coolant contamination into the system. For the GB300 NVL72, each compute tray requires 24 quick disconnect fittings — significantly more than the 10 per tray required for the GB200, reflecting the higher per-tray component count and the more complex internal plumbing of the Blackwell Ultra package.

Quick disconnects are a surprisingly high-stakes component for what appears to be simple plumbing hardware. A leaking QD on a compute tray inside a running rack can cause water damage to electronics worth millions of dollars, require emergency shutdown of a pod consuming dozens of megawatts, and result in weeks of downtime while trays are dried, inspected, and the cooling system is flushed and refilled. This is why QD supplier selection is treated as a mission-critical decision at the rack design stage.

The primary QD suppliers qualified for NVIDIA's rack generations are: CPC (Colder Products Company, US), Parker Hannifin (US) through its Parker Quick Couplings division — the leader in non-spill coupling solutions for data center liquid cooling — Danfoss (Denmark) through its hydraulics division, and Stäubli (Switzerland) with their precision fluid connectors. LOTES and Fositek (Taiwan) were completing qualification as secondary suppliers for GB300.

The coupling standard for this generation is the UKD (Universal Quick Disconnect) format for GB200 trays, with a move to specialized smaller NVIDIA-proprietary disconnects for GB300 at a unit cost of roughly $45–50 versus $70 for the UKD units — reflecting both the volume scale of production and the design optimization for NVIDIA's specific tray geometry.

Layer 3 — The rack manifold: distributing coolant to 18 trays

The rack manifold is the internal distribution tree that connects the external CDU supply and return lines to the inner manifolds of all 18 compute trays and 9 NVLink switch trays in the rack. NVIDIA designates this as the UQD08 manifold for third-generation MGX racks — a new internal manifold design introduced with Vera Rubin but also relevant to GB300 deployments.

The manifold must maintain consistent coolant pressure and flow rate across all 27 trays (18 compute + 9 switch) simultaneously, despite the fact that each tray's cold plates have slightly different flow resistance characteristics and operate at different power levels depending on workload. A compute tray running dense prefill at 100% GPU utilization may need significantly higher coolant flow than a switch tray at moderate bandwidth load. The manifold design and flow balancing valves must accommodate this asymmetry without allowing any single tray to run hot.

Rack manifold flow distribution for GB300 NVL72 Diagram showing the rack manifold distributing coolant from the CDU supply line to 18 compute trays and 9 NVLink switch trays. The manifold has a supply header and return header running vertically. Each tray connects via quick disconnect fittings. The GB300 requires 24 QDs per compute tray and 2 per switch tray, totaling 450 QDs per rack. CDU supply 30–35°C supply header return header Compute tray 1 4× B300 GPU + Grace CPU · 24 QDs Compute tray 2 4× B300 GPU + Grace CPU · 24 QDs Compute tray 3 4× B300 GPU + Grace CPU · 24 QDs trays 4–18 NVLink switch tray 1 NVSwitch · 2 QDs switch trays 2–9 GB300 NVL72 RACK QD COUNT 18 compute trays × 24 QDs = 432 compute QDs 9 switch trays × 2 QDs = 18 switch QDs 450 total QDs per rack vs. 198 QDs on GB200 NVL72 ~2.3× QD intensity increase Suppliers: Danfoss · CPC · Parker · Stäubli CDU return 45–55°C
Figure 2. Rack manifold coolant distribution in GB300 NVL72. Cool supply water (blue) flows down the supply header to each tray. Heated return water (red) flows back up the return header to the CDU. The GB300 generation requires 450 quick disconnect fittings per rack — 2.3× the 198 needed for GB200. Danfoss, CPC, Parker Hannifin, and Stäubli are the primary QD suppliers.

A critical design constraint at this layer is the approach temperature — the difference between the facility supply water temperature entering the CDU and the coolant temperature exiting the cold plate. The narrower this gap, the less room the thermal system has to absorb heat from the GPUs before coolant temperature exceeds the HBM throttle threshold. For a GB300 NVL72 with a 35°C CDU supply, a 20°C cold-plate ΔT, and an HBM throttle threshold of 95°C, there is meaningful headroom — but it evaporates quickly if inlet water temperature rises due to chiller degradation or high ambient conditions.

Layer 4 — Coolant distribution units: the thermal brain of the rack cluster

The coolant distribution unit is where facility cooling infrastructure connects to the IT equipment. The CDU performs three essential functions: it conditions the coolant that flows into the rack (adjusting temperature and pressure), it circulates that coolant through the rack loop using internal pumps, and it exchanges heat between the rack-side secondary coolant loop and the facility-side primary loop. On the rack-side loop, the CDU pushes dielectric or water-based coolant at the right pressure to maintain flow through all the cold plates. On the facility side, it connects to the data center's chilled water plant.

The dominant CDU architecture for Blackwell generation deployments is liquid-to-liquid (L2L): facility chilled water enters one side of an internal heat exchanger, rack coolant circulates on the other side, and heat transfers across the exchanger plates. The two fluid loops never mix, which keeps the facility water loop clean and allows the rack coolant to be a different chemical composition or corrosion-inhibitor blend optimized for GPU package compatibility.

Vertiv captured over 70% of volume share for the first wave of GB200 NVL72 deployments, driven by their Liebert XDU series CDUs — specifically the XDU500 and larger variants capable of handling multiple racks. Vertiv co-developed the GB200 NVL72 reference architecture with NVIDIA, and the Liebert XDU became the de facto standard that hyperscalers deploying early Blackwell generations specified. A single L2L CoolIT CDU supporting up to 8 NVL72 racks is priced at approximately $140K — about $18K of cooling infrastructure cost per rack. Vertiv's Liebert 1350 series, at $150–200K, represents the higher end of the market.

The competitive landscape for GB300 includes: CoolIT Systems (private, Calgary) with their CHx2000 series delivering 2 MW cooling capacity at 5°C approach temperature; Motivair (now Schneider Electric's liquid cooling division following its $850M acquisition in October 2024), whose dynamic cold plate and CDU technologies are integrated into Schneider's reference architecture for NVIDIA clusters; nVent Electric with its new modular row and rack CDUs including the Project Deschutes-inspired design aligned to Google's OCP specification; Boyd Corporation with in-row and in-rack CDU offerings validated for GB200; Delta Electronics (Taiwan) as a qualified CDU supplier; and a second tier of Taiwanese vendors including Gigabyte, Envicool, and MGCooling completing qualification for GB300.

The CDU is the component most likely to define data center PUE. A CDU operating with 5°C approach temperature (facility supply 25°C → rack supply 30°C) is thermally efficient but requires colder — and more energy-intensive — facility chilled water. One operating at 15°C approach temperature can use warmer facility water, which enables free-cooling in more climates but leaves less thermal headroom for the GPU stack. The facility water temperature is not just an infrastructure spec — it determines whether the HBM on your B300 GPUs throttles under sustained prefill load.

Layer 5 — Facility cooling: heat rejection outside the building

The CDU discharges heat into the facility's primary chilled water loop. That loop must carry the heat to somewhere it can be rejected to the ambient environment. There are four primary facility cooling approaches deployed at hyperscale AI data centers running Blackwell-generation infrastructure:

MethodTypical PUE contributionWater consumptionTemperature limitKey vendors
Mechanical chiller 1.2–1.4 Low (closed loop) Works in any climate Vertiv AFC chiller, Schneider, Modine
Cooling tower (evaporative) 1.05–1.15 High (evaporation) Humid climate challenge Baltimore Aircoil, EVAPCO
Dry cooler / free cooling 1.02–1.08 Minimal Works when ambient <30°C and supply temp ≥35°C Airedale (VRT), Modine, Schneider
Hybrid (chiller + free cooling) 1.08–1.20 Low to moderate Climate-adaptive, most flexible Vertiv/Schneider reference designs

The Schneider Electric reference design for GB300 NVL72 — co-engineered with NVIDIA — supports up to 142 kW per rack and uses liquid-to-liquid CDUs paired with high-temperature chillers. The design covers four technical areas: facility power, facility cooling, IT space layout, and lifecycle software, with the ETAP and EcoStruxure IT Design CFD models enabling digital twin simulation of specific power and cooling scenarios before physical deployment.

Modine Manufacturing has emerged as a notable second-tier facility cooling player, with its data center segment growing 31% sequentially in Q3 FY2026. Modine's Cooling AI control system and 1 MW CDU products target the chiller and dry-cooler layers of the facility stack, with a five-year order backlog indicating sustained demand for the Blackwell and Rubin deployment cycles.

Hybrid cooling: what stays air-cooled in GB300 and why

The GB300 NVL72 is described as a hybrid cooling architecture because not everything in the rack is liquid-cooled. The components that remain on air cooling are: OSFP optical transceiver modules, E1.S NVMe storage drives, and the power distribution boards (PDBs). This is not an oversight — it is a deliberate thermal engineering decision.

These components have two characteristics that make air cooling still viable: their heat flux density is much lower than GPU die, and their performance degradation under elevated temperature is either gradual (drives throttle optical modules tolerate wider temperature ranges) or manageable at current densities. The residual air flow in the rack — no longer serving the GPUs or CPUs but still present from the PSU fans and slot airflow — is sufficient for these components.

The practical consequence is that GB300 deployment requires both a liquid cooling supply infrastructure and continued attention to rack airflow patterns. A data center that eliminates all in-row air cooling because "it's a liquid-cooled rack" will find its optical transceivers running hotter than specified, which shortens their service life and can increase link error rates on the ConnectX-9 NICs that connect GPUs to the scale-out fabric.

The full vendor map

LayerFunctionPrimary vendorsNotes
L1 — Cold plate Die-to-liquid heat transfer Boyd, Asia Vital (AVC), Cooler Master Boyd on NVIDIA RVL. Per-platform custom geometry.
L2 — QD / inner manifold Blind-mate tray connections Parker Hannifin, Danfoss, CPC, Stäubli, LOTES 450 QDs per rack. Non-spill critical.
L3 — Rack manifold Rack-level coolant distribution Cooler Master, Auras, Boyd, NVIDIA-designed UQD08 UQD08 standard for 3rd-gen MGX.
L4 — CDU L2L heat exchange, pump, conditioning Vertiv (Liebert XDU), CoolIT, Motivair (SE), nVent, Boyd, Delta, Gigabyte Vertiv held 70%+ share in GB200 wave.
L5 — Facility cooling Chiller, dry cooler, cooling tower Vertiv (AFC), Schneider Electric, Modine, Carrier Schneider holds NVIDIA reference design for GB300.
Coolant Heat transfer fluid Honeywell (Novec candidate), Lubrizol, Engineered Fluids Water + inhibitor for primary DTC loops.
Facility TIM Package-to-cold-plate interface Indium, Shin-Etsu, Honeywell PTM Metallic TIM preferred for thermal cycling.

Failure modes in the thermal chain

The five-layer architecture creates five categories of failure mode, each with a distinct signature in telemetry and a different consequence for compute throughput. Understanding this taxonomy is necessary for building monitoring systems that can attribute thermal performance degradation to the correct layer.

Cold plate degradation manifests as increased thermal resistance at a specific GPU package — that GPU's HBM temperature rises faster and higher than its neighbors under equivalent workload. The cause is typically TIM pump-out (the TIM migrates over thermal cycles until it no longer fills the gap uniformly), or micro-crazing of the cold plate brazing under repeated thermal expansion. The signature is a single-device HBM temperature outlier that worsens over months.

QD partial blockage or slow leak manifests as reduced flow through a single compute tray, causing all four GPU packages on that tray to run hotter simultaneously. The signature is a tray-correlated thermal anomaly — all GPUs on tray N run 5–8°C hotter than GPUs on adjacent trays at the same utilization. A slow leak shows up as coolant volume decrease in the CDU reservoir and a gradual rise in rack-side coolant inlet temperature.

CDU pump degradation manifests as reduced system-wide coolant flow rate, causing the entire rack to run hotter at a given workload. The signature is a rack-level thermal rise that correlates with CDU pump RPM dropping from nominal. CDUs with dual-pump redundancy can mask this until the second pump starts showing similar degradation.

Facility supply water temperature rise is the most insidious failure mode because it has no signature within the rack's own telemetry — the rack thermometry looks normal, but HBM temperatures drift upward over hours as chiller efficiency degrades during peak summer loads or partial chiller failures. The correct monitoring attachment point is the CDU facility water inlet sensor, not GPU temperature alone.

Manifold flow imbalance between trays can arise if a flow-balancing valve shifts or if scale or particulate accumulates in one branch of the distribution tree. The signature is systematic temperature differences between trays that should be thermally equivalent, which is detectable only if per-tray coolant flow sensors are present — a capability that not all CDU vendors expose in their management interfaces.

What changes with Vera Rubin

The Blackwell cooling architecture — five layers, hybrid liquid-plus-air, 30–35°C CDU supply, QD-intensive tray plumbing — is the current production standard. Vera Rubin NVL72 changes several of its fundamental constraints. It is 100% liquid-cooled with no air-cooled components remaining in the compute or switch trays. It operates at a 45°C supply temperature rather than 30–35°C, unlocking free-cooling in a much wider range of climates. And it replaces the existing QD-and-hose tray interconnect with a blind-mate PCB midplane that eliminates hoses, cables, and most of the manual assembly points entirely.

Each of those changes has a supply chain consequence. The 45°C supply temperature creates demand for CDUs designed around higher approach temperatures. The fan-free, hose-free rack design means QD manufacturers face a structural reduction in rack-level QD count — replaced by the PCB midplane vendors that supply the new connector system. And the liquid-cooled busbar — a new requirement in Vera Rubin that did not exist in Blackwell — creates a component category with no current volume supply ecosystem.

The thermal chain from die to facility does not change in its essential logic between Blackwell and Vera Rubin. What changes is where the engineering leverage and commercial opportunity sit at each layer. That is the subject of the companion essay.


Manish KL writes about AI infrastructure, memory systems, accelerator architecture, and cooling systems. Related essays: AI Cluster Reliability Beyond Fault-Tolerant Parallelism · The Next AI Cluster Failure Won't Look Like a GPU Failure · The Real AI Bottleneck Is Moving From Compute to Interconnect Power Density

© 2026 Manish KL