The bottom-up accounting
Let's build the storage budget for a hypothetical 100,000-GPU cluster based on DGX GB200 architecture. This is a simplified but representative model of what hyperscale AI deployments actually look like in 2026.
| Storage Tier | Per Node | Per 1,000 GPUs | Per 100,000 GPUs | NAND Type |
|---|---|---|---|---|
| Local NVMe (hot) | 15.4 TB | 3.9 PB | 385 PB | TLC (high endurance) |
| Local NVMe (warm staging) | ~30 TB (optional) | 7.5 PB | 750 PB | TLC/QLC mixed |
| Networked flash (checkpoints) | shared | ~5 PB | ~500 PB | QLC (capacity-optimized) |
| Dataset lake (flash tier) | shared | ~10 PB | ~1 EB | QLC (read-optimized) |
| Model repository | shared | ~0.5 PB | ~50 PB | QLC |
2.5–3 exabytes of NAND flash across all tiers. At current enterprise SSD pricing ($0.08–0.15/GB), the flash bill alone is $200–450 million — a figure that is rarely discussed alongside the GPU cost.
Where the capacity goes
Local NVMe: the latency-bounded staging layer
Each DGX GB200 compute tray ships with 4× 3.84 TB E1.S NVMe drives — approximately 15.4 TB of local flash. In a 100,000-GPU cluster with ~25,000 compute trays, this totals 385 PB of local NVMe. This capacity serves the latency-sensitive roles: checkpoint absorption, KV-cache offload, weight staging, and dataset hydration.
At scale, operators increasingly add a second tier of warm staging — 4× 7.68 TB or 8× 7.68 TB drives per node — for model swapping and multi-model inference. This can push per-node local flash to 30–60 TB, and total cluster local flash toward 750 PB–1.5 EB.
Checkpoint storage: the write-heavy reservoir
A 405B model checkpoint at full optimizer state is ~3.2 TB. During a training run that checkpoints every 15 minutes, the cluster generates approximately 13 TB of checkpoint data per hour. Over a 90-day training run, that is ~28 PB of cumulative checkpoint writes — though older checkpoints are pruned, so the steady-state reservoir is typically 5–10 most recent checkpoints (16–32 PB).
Checkpoint storage is typically served by a high-performance parallel file system (VAST, Weka, GPFS) backed by all-flash NVMe arrays. These systems require high sequential write bandwidth (to absorb checkpoint bursts) and high sequential read bandwidth (to recover from failures). The NAND underneath is typically QLC for cost efficiency, with TLC caching for write burst absorption.
The dataset lake
Training datasets for frontier models range from 10–100+ TB of tokenized data. Multi-modal training (text + images + video + code) pushes this toward petabyte scale. The flash tier of the dataset lake serves hot partitions — the data being actively consumed by the current training epoch — while cold partitions reside on object storage (S3, GCS) or tape.
The NAND supply pressure
A single 100,000-GPU cluster consuming 2.5–3 EB of NAND represents a significant fraction of global NAND production. The global NAND market produced approximately 350–400 EB of raw capacity in 2025, with enterprise SSD shipments accounting for about 30–40% of that. Three or four hyperscale AI clusters of this size would consume 7–12 EB — or 5–8% of total global enterprise NAND supply.
This concentration of demand is already reshaping the NAND market. SK Hynix, Samsung, and Kioxia have shifted manufacturing priorities toward high-density enterprise QLC (321-layer and beyond) specifically to serve AI capacity needs. The enterprise SSD product lines are bifurcating into "AI-optimized" (read-heavy, high-IOPS, moderate endurance) and "general enterprise" (balanced read/write, standard endurance) — a segmentation driven entirely by the distinctive access patterns of AI workloads.
The form factor evolution
AI cluster storage is driving a form factor transition from the legacy 2.5" U.2 and M.2 drives toward the E1.S and E3.S EDSFF (Enterprise and Data Center SSD Form Factor) standards:
| Form Factor | Typical Capacity | Thermal Design | Key Advantage for AI |
|---|---|---|---|
| M.2 2280 | 1–4 TB | Passive (board-mounted) | Boot drives only; too slow for data path |
| U.2 (2.5") | 4–30 TB | Air-cooled chassis | Legacy; used in DGX H100. High-capacity but poor thermal density |
| E1.S | 4–16 TB | Hot-swap, direct airflow | DGX GB200 standard. Excellent thermal management, hot-swappable |
| E3.S | 8–64 TB | Direct-attach, liquid-cooling compatible | Next-gen AI: higher capacity, designed for 50–100 kW rack densities |
The transition to E1.S and E3.S is driven by thermal necessity: AI racks at 50–100 kW density cannot provide sufficient airflow for 2.5" drives deep inside the chassis. E1.S and E3.S drives are designed with direct-airflow or conduction-cooling paths that maintain NAND operating temperatures below 85°C even in high-density, liquid-cooled configurations.
The 256 TB drive and what it means
Samsung's PM1763 — a PCIe Gen 5 E3.S drive shipping in 2026 — offers up to 256 TB in a single drive, with 512 TB Gen 6 variants planned for 2027. SK Hynix is demonstrating 245 TB eSSD prototypes using 321-layer QLC NAND. These capacities were unthinkable five years ago and are being driven almost entirely by AI demand.
A single 256 TB drive can hold approximately 160 copies of a 70B model at fp16, or the complete checkpoint history of a 405B training run. At scale, this means a checkpoint reservoir that previously required 100+ drives can be served by a handful. The implications for rack density, cabling, and failure domains are significant.
Where this is heading
By 2028, a 500,000-GPU "frontier cluster" will likely require 10–15 EB of total NAND capacity — approaching 3–4% of projected global annual NAND production. This creates genuine supply-chain exposure: a single hyperscaler's cluster build can measurably tighten the global NAND market, driving price increases that ripple through consumer electronics, enterprise IT, and automotive sectors.
The storage industry's response is threefold: higher-density NAND (300+ layer V-NAND with QLC and PLC/5-bit-per-cell), AI-optimized SSD controllers (100M+ IOPS, GPUDirect Storage support, direct GPU-to-flash data paths), and new memory tiers (HBF, CXL-attached flash) that blur the line between storage and memory. The GPU gets the headlines, but the NAND gets the purchase orders.
AI clusters are the largest concentrated consumers of NAND flash in history. The GPU determines what work gets done. The flash determines whether the GPU stays fed. Increasingly, the binding constraint is the SSD.