Cooling is no longer a facilities afterthought that gets sorted out after the GPUs arrive. For the Blackwell GB300 NVL72, the thermal system is a precision five-layer architecture with specific component vendors, defined thermal budgets at every interface, and failure modes that propagate directly into training throughput and inference latency. This is how it actually works — from the cold plate on the die to the chiller outside the building.
There is a tendency to think of data center cooling as the infrastructure layer that precedes compute — the pipes and pumps and chillers that the facilities team installs before the GPU racks arrive. That mental model was defensible when racks drew 20 kW and could be managed with in-row air cooling. It stopped being defensible with Hopper. It became actively incorrect with Blackwell.
The Blackwell B200 GPU draws up to 1,000 W per chip under sustained load. A GB300 NVL72 rack houses 72 of them across 18 compute trays alongside 36 Grace CPUs, 9 NVLink switch trays, and all their associated networking and power delivery hardware. The aggregate peak draw is approximately 142 kW — in a single rack that occupies the same floor footprint as a conventional 42U server rack. That is seven times the power density of a typical 2022 enterprise GPU server and roughly 200 times the power density of an office building.
At 142 kW in a fixed volume, the thermal challenge is not simply "add more cooling." It is a precision engineering problem. Heat cannot move fast enough through air to keep that many GPU die junctions within their operating temperature windows. The only viable approach is direct liquid cooling — moving coolant physically to the surface of each chip, absorbing heat into the liquid, and carrying it away through a structured plumbing hierarchy. The five layers of that hierarchy are what this essay maps.
When a rack draws 142 kW and a single HBM stack throttles at 95°C, the cooling system is not support infrastructure. It is directly in the critical path of token throughput.
Before tracing the cooling stack, it is worth being precise about where the heat originates and in what proportions. The GB300 NVL72 is not a single homogeneous heat source — it is a structured collection of components with very different power densities, different cooling requirements, and different consequences for the application if they throttle.
| Component | Count per rack | Power per unit | Cooling method | Trip consequence |
|---|---|---|---|---|
| Blackwell Ultra B300 GPU | 72 | ~1,000 W TDP | Direct liquid (cold plate) | MEMCLK reduction, SM throttle, latency spike |
| Grace ARM CPU | 36 | ~300 W TDP | Direct liquid (cold plate) | CPU frequency drop, NVLink-C2C bandwidth loss |
| NVLink 5 Switch (NVSwitch) | 9 trays | ~800–1,000 W per tray | Direct liquid | All-reduce stall, collective BW degradation |
| OSFP optical modules | ~648 (36 per tray) | 3–5 W each | Air (residual airflow) | Link error rate rise, retransmission |
| E1.S NVMe drives | Up to 8 per tray | 5–7 W each | Air | Thermal throttle, I/O latency |
| Power shelves (PSU + conversion) | 6–8 | ~11 kW input each (at ~94% eff.) | Air / partial liquid | VRM degradation, supply ripple |
The critical insight from this table is that the liquid-cooled components — GPUs, CPUs, and NVSwitches — generate over 90% of the total rack heat load. The air-cooled components are a distinct minority. This is why the GB300 NVL72 is described as a hybrid cooling system: it uses liquid cooling for the high-density compute core and retains residual air flow for the lower-density peripherals. That residual airflow must still be managed — but it is not the limiting constraint.
The thermal path in a GB300 NVL72 deployment can be understood as five sequential layers, each performing a specific heat transfer function and each with its own set of vendors, failure modes, and performance constraints. Heat enters at Layer 1 and exits the building at Layer 5. Every layer introduces thermal resistance that reduces the available temperature budget for the layer above it.
The cold plate is the device that makes direct liquid cooling physically possible. It is a precision-machined metal block — typically brazed copper with internal microchannels — that clamps directly to the surface of the GPU die or CPU package, with a thin thermal interface material (TIM) filling the microscopic gap between the plate and the package lid. Coolant flows through the microchannels, absorbs heat conducted from the die, and exits the plate carrying that heat toward the rest of the system.
The engineering challenge at this layer is minimizing thermal resistance while maintaining mechanical reliability across thermal cycles. A GPU package in full operation and then idle will expand and contract as its temperature changes. The cold plate, the TIM, and the mounting hardware must accommodate that cycling without creating micro-cracks in solder joints or gaps in the TIM layer — either of which would increase thermal resistance and eventually cause throttling.
For the GB300 NVL72, cold plates are required for 72 GPU packages and 36 CPU packages per rack — 108 precision assemblies that must each maintain low thermal resistance for the platform's full lifetime. The two primary approaches used in this generation are:
The TIM between cold plate and package lid is as important as the plate geometry. State-of-the-art TIMs for this application are typically indium-based metallic thermal interface materials or high-performance polymer-based compounds with thermal conductivity in the 8–15 W/m·K range. The interface resistance budget is tight: for a 1,000 W GPU package with a cold plate contact area of roughly 60 cm², even a 0.05 K·cm²/W interface resistance translates to a 1°C temperature rise — which at the margins of the HBM thermal budget matters.
A single compute tray in the GB300 NVL72 houses four Grace-Blackwell Superchip packages — each containing two B300 GPUs and one Grace CPU — along with their cold plates, local plumbing, and power delivery hardware. The inner tray manifold is the plumbing network that distributes coolant from the rack's internal supply line to each cold plate on the tray, and collects heated coolant back from those plates to return to the rack manifold.
The key engineering element at this layer is the quick disconnect (QD) fitting — the mechanical coupling that allows a compute tray to be inserted into or removed from the rack without tools, without manually disconnecting any hoses, and ideally without spilling or introducing any coolant contamination into the system. For the GB300 NVL72, each compute tray requires 24 quick disconnect fittings — significantly more than the 10 per tray required for the GB200, reflecting the higher per-tray component count and the more complex internal plumbing of the Blackwell Ultra package.
Quick disconnects are a surprisingly high-stakes component for what appears to be simple plumbing hardware. A leaking QD on a compute tray inside a running rack can cause water damage to electronics worth millions of dollars, require emergency shutdown of a pod consuming dozens of megawatts, and result in weeks of downtime while trays are dried, inspected, and the cooling system is flushed and refilled. This is why QD supplier selection is treated as a mission-critical decision at the rack design stage.
The primary QD suppliers qualified for NVIDIA's rack generations are: CPC (Colder Products Company, US), Parker Hannifin (US) through its Parker Quick Couplings division — the leader in non-spill coupling solutions for data center liquid cooling — Danfoss (Denmark) through its hydraulics division, and Stäubli (Switzerland) with their precision fluid connectors. LOTES and Fositek (Taiwan) were completing qualification as secondary suppliers for GB300.
The coupling standard for this generation is the UKD (Universal Quick Disconnect) format for GB200 trays, with a move to specialized smaller NVIDIA-proprietary disconnects for GB300 at a unit cost of roughly $45–50 versus $70 for the UKD units — reflecting both the volume scale of production and the design optimization for NVIDIA's specific tray geometry.
The rack manifold is the internal distribution tree that connects the external CDU supply and return lines to the inner manifolds of all 18 compute trays and 9 NVLink switch trays in the rack. NVIDIA designates this as the UQD08 manifold for third-generation MGX racks — a new internal manifold design introduced with Vera Rubin but also relevant to GB300 deployments.
The manifold must maintain consistent coolant pressure and flow rate across all 27 trays (18 compute + 9 switch) simultaneously, despite the fact that each tray's cold plates have slightly different flow resistance characteristics and operate at different power levels depending on workload. A compute tray running dense prefill at 100% GPU utilization may need significantly higher coolant flow than a switch tray at moderate bandwidth load. The manifold design and flow balancing valves must accommodate this asymmetry without allowing any single tray to run hot.
A critical design constraint at this layer is the approach temperature — the difference between the facility supply water temperature entering the CDU and the coolant temperature exiting the cold plate. The narrower this gap, the less room the thermal system has to absorb heat from the GPUs before coolant temperature exceeds the HBM throttle threshold. For a GB300 NVL72 with a 35°C CDU supply, a 20°C cold-plate ΔT, and an HBM throttle threshold of 95°C, there is meaningful headroom — but it evaporates quickly if inlet water temperature rises due to chiller degradation or high ambient conditions.
The coolant distribution unit is where facility cooling infrastructure connects to the IT equipment. The CDU performs three essential functions: it conditions the coolant that flows into the rack (adjusting temperature and pressure), it circulates that coolant through the rack loop using internal pumps, and it exchanges heat between the rack-side secondary coolant loop and the facility-side primary loop. On the rack-side loop, the CDU pushes dielectric or water-based coolant at the right pressure to maintain flow through all the cold plates. On the facility side, it connects to the data center's chilled water plant.
The dominant CDU architecture for Blackwell generation deployments is liquid-to-liquid (L2L): facility chilled water enters one side of an internal heat exchanger, rack coolant circulates on the other side, and heat transfers across the exchanger plates. The two fluid loops never mix, which keeps the facility water loop clean and allows the rack coolant to be a different chemical composition or corrosion-inhibitor blend optimized for GPU package compatibility.
Vertiv captured over 70% of volume share for the first wave of GB200 NVL72 deployments, driven by their Liebert XDU series CDUs — specifically the XDU500 and larger variants capable of handling multiple racks. Vertiv co-developed the GB200 NVL72 reference architecture with NVIDIA, and the Liebert XDU became the de facto standard that hyperscalers deploying early Blackwell generations specified. A single L2L CoolIT CDU supporting up to 8 NVL72 racks is priced at approximately $140K — about $18K of cooling infrastructure cost per rack. Vertiv's Liebert 1350 series, at $150–200K, represents the higher end of the market.
The competitive landscape for GB300 includes: CoolIT Systems (private, Calgary) with their CHx2000 series delivering 2 MW cooling capacity at 5°C approach temperature; Motivair (now Schneider Electric's liquid cooling division following its $850M acquisition in October 2024), whose dynamic cold plate and CDU technologies are integrated into Schneider's reference architecture for NVIDIA clusters; nVent Electric with its new modular row and rack CDUs including the Project Deschutes-inspired design aligned to Google's OCP specification; Boyd Corporation with in-row and in-rack CDU offerings validated for GB200; Delta Electronics (Taiwan) as a qualified CDU supplier; and a second tier of Taiwanese vendors including Gigabyte, Envicool, and MGCooling completing qualification for GB300.
The CDU discharges heat into the facility's primary chilled water loop. That loop must carry the heat to somewhere it can be rejected to the ambient environment. There are four primary facility cooling approaches deployed at hyperscale AI data centers running Blackwell-generation infrastructure:
| Method | Typical PUE contribution | Water consumption | Temperature limit | Key vendors |
|---|---|---|---|---|
| Mechanical chiller | 1.2–1.4 | Low (closed loop) | Works in any climate | Vertiv AFC chiller, Schneider, Modine |
| Cooling tower (evaporative) | 1.05–1.15 | High (evaporation) | Humid climate challenge | Baltimore Aircoil, EVAPCO |
| Dry cooler / free cooling | 1.02–1.08 | Minimal | Works when ambient <30°C and supply temp ≥35°C | Airedale (VRT), Modine, Schneider |
| Hybrid (chiller + free cooling) | 1.08–1.20 | Low to moderate | Climate-adaptive, most flexible | Vertiv/Schneider reference designs |
The Schneider Electric reference design for GB300 NVL72 — co-engineered with NVIDIA — supports up to 142 kW per rack and uses liquid-to-liquid CDUs paired with high-temperature chillers. The design covers four technical areas: facility power, facility cooling, IT space layout, and lifecycle software, with the ETAP and EcoStruxure IT Design CFD models enabling digital twin simulation of specific power and cooling scenarios before physical deployment.
Modine Manufacturing has emerged as a notable second-tier facility cooling player, with its data center segment growing 31% sequentially in Q3 FY2026. Modine's Cooling AI control system and 1 MW CDU products target the chiller and dry-cooler layers of the facility stack, with a five-year order backlog indicating sustained demand for the Blackwell and Rubin deployment cycles.
The GB300 NVL72 is described as a hybrid cooling architecture because not everything in the rack is liquid-cooled. The components that remain on air cooling are: OSFP optical transceiver modules, E1.S NVMe storage drives, and the power distribution boards (PDBs). This is not an oversight — it is a deliberate thermal engineering decision.
These components have two characteristics that make air cooling still viable: their heat flux density is much lower than GPU die, and their performance degradation under elevated temperature is either gradual (drives throttle optical modules tolerate wider temperature ranges) or manageable at current densities. The residual air flow in the rack — no longer serving the GPUs or CPUs but still present from the PSU fans and slot airflow — is sufficient for these components.
The practical consequence is that GB300 deployment requires both a liquid cooling supply infrastructure and continued attention to rack airflow patterns. A data center that eliminates all in-row air cooling because "it's a liquid-cooled rack" will find its optical transceivers running hotter than specified, which shortens their service life and can increase link error rates on the ConnectX-9 NICs that connect GPUs to the scale-out fabric.
| Layer | Function | Primary vendors | Notes |
|---|---|---|---|
| L1 — Cold plate | Die-to-liquid heat transfer | Boyd, Asia Vital (AVC), Cooler Master | Boyd on NVIDIA RVL. Per-platform custom geometry. |
| L2 — QD / inner manifold | Blind-mate tray connections | Parker Hannifin, Danfoss, CPC, Stäubli, LOTES | 450 QDs per rack. Non-spill critical. |
| L3 — Rack manifold | Rack-level coolant distribution | Cooler Master, Auras, Boyd, NVIDIA-designed UQD08 | UQD08 standard for 3rd-gen MGX. |
| L4 — CDU | L2L heat exchange, pump, conditioning | Vertiv (Liebert XDU), CoolIT, Motivair (SE), nVent, Boyd, Delta, Gigabyte | Vertiv held 70%+ share in GB200 wave. |
| L5 — Facility cooling | Chiller, dry cooler, cooling tower | Vertiv (AFC), Schneider Electric, Modine, Carrier | Schneider holds NVIDIA reference design for GB300. |
| Coolant | Heat transfer fluid | Honeywell (Novec candidate), Lubrizol, Engineered Fluids | Water + inhibitor for primary DTC loops. |
| Facility TIM | Package-to-cold-plate interface | Indium, Shin-Etsu, Honeywell PTM | Metallic TIM preferred for thermal cycling. |
The five-layer architecture creates five categories of failure mode, each with a distinct signature in telemetry and a different consequence for compute throughput. Understanding this taxonomy is necessary for building monitoring systems that can attribute thermal performance degradation to the correct layer.
Cold plate degradation manifests as increased thermal resistance at a specific GPU package — that GPU's HBM temperature rises faster and higher than its neighbors under equivalent workload. The cause is typically TIM pump-out (the TIM migrates over thermal cycles until it no longer fills the gap uniformly), or micro-crazing of the cold plate brazing under repeated thermal expansion. The signature is a single-device HBM temperature outlier that worsens over months.
QD partial blockage or slow leak manifests as reduced flow through a single compute tray, causing all four GPU packages on that tray to run hotter simultaneously. The signature is a tray-correlated thermal anomaly — all GPUs on tray N run 5–8°C hotter than GPUs on adjacent trays at the same utilization. A slow leak shows up as coolant volume decrease in the CDU reservoir and a gradual rise in rack-side coolant inlet temperature.
CDU pump degradation manifests as reduced system-wide coolant flow rate, causing the entire rack to run hotter at a given workload. The signature is a rack-level thermal rise that correlates with CDU pump RPM dropping from nominal. CDUs with dual-pump redundancy can mask this until the second pump starts showing similar degradation.
Facility supply water temperature rise is the most insidious failure mode because it has no signature within the rack's own telemetry — the rack thermometry looks normal, but HBM temperatures drift upward over hours as chiller efficiency degrades during peak summer loads or partial chiller failures. The correct monitoring attachment point is the CDU facility water inlet sensor, not GPU temperature alone.
Manifold flow imbalance between trays can arise if a flow-balancing valve shifts or if scale or particulate accumulates in one branch of the distribution tree. The signature is systematic temperature differences between trays that should be thermally equivalent, which is detectable only if per-tray coolant flow sensors are present — a capability that not all CDU vendors expose in their management interfaces.
The Blackwell cooling architecture — five layers, hybrid liquid-plus-air, 30–35°C CDU supply, QD-intensive tray plumbing — is the current production standard. Vera Rubin NVL72 changes several of its fundamental constraints. It is 100% liquid-cooled with no air-cooled components remaining in the compute or switch trays. It operates at a 45°C supply temperature rather than 30–35°C, unlocking free-cooling in a much wider range of climates. And it replaces the existing QD-and-hose tray interconnect with a blind-mate PCB midplane that eliminates hoses, cables, and most of the manual assembly points entirely.
Each of those changes has a supply chain consequence. The 45°C supply temperature creates demand for CDUs designed around higher approach temperatures. The fan-free, hose-free rack design means QD manufacturers face a structural reduction in rack-level QD count — replaced by the PCB midplane vendors that supply the new connector system. And the liquid-cooled busbar — a new requirement in Vera Rubin that did not exist in Blackwell — creates a component category with no current volume supply ecosystem.
The thermal chain from die to facility does not change in its essential logic between Blackwell and Vera Rubin. What changes is where the engineering leverage and commercial opportunity sit at each layer. That is the subject of the companion essay.
Manish KL writes about AI infrastructure, memory systems, accelerator architecture, and cooling systems. Related essays: AI Cluster Reliability Beyond Fault-Tolerant Parallelism · The Next AI Cluster Failure Won't Look Like a GPU Failure · The Real AI Bottleneck Is Moving From Compute to Interconnect Power Density
© 2026 Manish KL