AI Storage Primer: From NAND Physics to GPUDirect and the Memory Wall

1. Storage Basics: Persistence, Latency, Bandwidth, and IOPS

A good storage mental model starts with a simple distinction: memory is where active state lives right now; storage is where state survives. DRAM and HBM are fast but volatile. Power them off and their contents disappear. HDDs and SSDs are persistent. They survive reboots, failures, and restarts.

LatencyThe time to begin getting the data. Nanoseconds matter for memory. Microseconds matter for flash. Milliseconds matter for disks.

BandwidthHow many bytes per second can flow once transfers are underway. AI training and checkpointing often care about this intensely.

IOPSHow many small operations per second a device can sustain. Random read-heavy systems like vector retrieval care here.

PersistenceWhether data survives power loss. Persistent media is essential for datasets, checkpoints, logs, and model distribution.

AI workloads stress all four at once. Training wants huge throughput. Retrieval wants lots of random reads. Checkpointing wants sustained write bandwidth. Long-context systems force hot and cold state to move across tiers. That is why storage design in AI is not just “buy faster SSDs.” It is workload matching.

2. NAND Flash from First Principles

SSDs are built on NAND flash. At a very high level, NAND stores charge in cells. The more bits you try to pack into each cell, the cheaper and denser the storage becomes, but the harder it gets to read and write quickly and reliably.

Type	Bits per cell	Best property	Main compromise
SLC	1	Fastest, most durable	Too expensive for broad capacity use
MLC	2	Balanced performance/endurance	Lower density than TLC/QLC
TLC	3	Good mainstream tradeoff	More fragile and slower than lower-bit cells
QLC	4	Maximum density / low cost per bit	Lower endurance and weaker write behavior

This is why QLC exists: AI infrastructure wants absurd capacity, and capacity has to be paid for. If the hot path does not require the endurance or write behavior of TLC, then QLC becomes attractive because it pushes cost per terabyte down.

Program/erase cycles and endurance

NAND cells wear out. They can only tolerate a limited number of program/erase cycles before reliability degrades. That is why SSD endurance is a real design variable, not a footnote. AI checkpointing, heavy logging, and write-heavy preprocessing can stress endurance much more than read-mostly serving.

Write amplification

NAND cannot overwrite arbitrarily the way RAM can. Data is written in pages, but erasure happens in larger blocks. That mismatch means the SSD controller often has to move and rewrite more data internally than the host actually requested. That is write amplification. In practical terms, it hurts both performance and endurance.

SSD performance is not just about the flash chips. It is about how intelligently the controller hides NAND’s awkward physics from the software above it.

3. Inside an SSD: Controller, Channels, FTL, and DRAM

An SSD is not “just fast flash.” It is a storage computer. The controller schedules reads and writes across many NAND packages and channels in parallel, manages wear leveling, garbage collection, error correction, mapping tables, and often a DRAM cache.

ChannelsSSD throughput comes from parallelism. More channels and packages mean more simultaneous operations.

FTLThe Flash Translation Layer maps logical block addresses to physical flash locations and hides erase-block awkwardness.

DRAM cacheOften stores mapping metadata and helps performance; DRAM-less designs can be cheaper but usually give up some behavior.

The right mental model is: SSD performance = flash media + controller architecture + parallelism + firmware policy. Two SSDs with similar NAND may behave very differently because of controller and firmware choices.

This matters in AI because workloads are mixed. A training-stage staging SSD may see huge sequential reads. A vector system may see random reads. A checkpoint SSD may see bursts of large writes. The same device class can look great in one workload and mediocre in another.

4. NVMe and PCIe: How Data Really Moves

NVMe matters because the older SATA/AHCI stack was built for a different era. NVMe was designed for solid-state storage and deep parallelism. It uses PCIe directly and supports many queues with deep queue depth. In plain language: NVMe is the protocol/storage stack that lets SSDs behave more like parallel devices and less like single-file storage appliances.

NVMe matters because SSDs are parallel devices. The software stack has to let that parallelism surface.

PCIe lane count matters too. A PCIe x4 SSD is not just “a drive”; it is a device attached to a fabric with defined bandwidth and contention properties. In AI systems, that matters because multiple SSDs, NICs, and accelerators often compete on the same host root complexes and switches.

5. Why HDD Still Matters in AI

HDD is not dead in AI. It simply occupies a different place in the hierarchy. Large AI datasets, logs, archived checkpoints, compliance retention, replay data, and cold object stores do not disappear because GPUs got faster. In fact, the more AI generates and consumes data, the more attractive cheap bulk storage becomes.

Seagate’s public AI-storage positioning leans directly into this point, arguing that high-capacity hard-drive storage paired with SSD caching remains useful for AI pipelines and that NVMe connectivity can make HDD easier to integrate into modern storage fabrics. WD’s own AI Data Cycle also explicitly keeps HDD in the picture as part of the cost-optimized storage mix. That is the real answer: HDD survives because the economics of cheap capacity still matter at AI scale.

6. HBM vs DRAM vs SSD vs HDD: The Hierarchy That Actually Exists

AI pain often comes from pretending these layers are interchangeable when they are not.

Tier	Persistence	Typical latency class	Typical bandwidth class	Primary AI role
HBM	No	sub-microsecond / memory-class	TB/s class	accelerator-local active working set
DRAM	No	~100ns order	tens to hundreds of GB/s	host memory, staging, buffers, metadata
NVMe SSD	Yes	tens to hundreds of microseconds	GB/s class	hot persistent data, checkpoints, vectors, spill
HDD	Yes	milliseconds	hundreds of MB/s	bulk lake, archive, cold retention

The hierarchy gap is the whole story. HBM can be orders of magnitude faster than SSD, and SSD can still be orders of magnitude faster than HDD. AI systems hurt when they are forced to operate across that gap carelessly.

7. How AI Workloads Stress Storage

Training ingestUsually throughput-heavy and sequential, but can become ugly if preprocessing or shuffling creates many random accesses.

CheckpointingLarge writes with strong durability expectations; burst handling and sustained bandwidth matter.

Inference + retrievalOften random-read heavy and latency-sensitive; vector indexes and embeddings can stress IOPS more than raw throughput.

Telemetry/loggingSteady write streams that can quietly become huge at fleet scale and push cold tiers back into strategic importance.

This is why a storage primer for AI cannot just talk about device peak speed. Workload shape matters. Sequential training data streaming is a different problem from low-latency vector retrieval. Checkpointing is a different problem from log retention. The correct storage design almost always begins with the I/O profile.

8. Seagate, WD, and Sandisk: Three Different Bets

WD: the AI Data Cycle framing

WD’s strongest public contribution is conceptual. Its AI Data Cycle framework argues that AI needs a storage mix aligned to workflow stage, not a single hero medium. WD has also tied specific products like its Gen5 SN861 SSD to AI-heavy deployments, including certification messaging around NVIDIA GB200 NVL72 systems.

Seagate: cheap capacity is strategic, not old-fashioned

Seagate’s public AI storage writing stresses that AI’s data footprint keeps expanding and that large-capacity HDDs paired with faster tiers still make architectural sense. The interesting part of Seagate’s argument is not nostalgia for disks. It is the claim that AI needs economically scalable capacity, and that means storage hierarchy, not flash monoculture.

Sandisk: push flash upward

Sandisk’s public positioning is the most aggressive on flash evolution: very large enterprise SSDs, AI-oriented QLC roadmaps, and HBF as a response to the AI memory wall. That is a useful signal. Sandisk is effectively arguing that flash is being asked to move closer to the memory plane.

9. GPUDirect Storage: When the CPU Becomes Too Expensive in the Path

NVIDIA describes GPUDirect Storage as a direct data path between local or remote storage and GPU memory that avoids extra copies through CPU memory. In practical terms, that means DMA engines near storage or network adapters can move data more directly into GPU memory. The reason this matters is simple: once datasets get large enough, bouncing everything through the CPU becomes both a latency cost and a bandwidth tax.

The whole point of GPUDirect Storage is that the CPU can become an unnecessary middleman for high-volume AI data movement.

GPUDirect Storage does not magically erase bad storage design. It simply removes one kind of overhead and exposes the next bottleneck more clearly.

10. Why High Bandwidth Flash Exists

High Bandwidth Flash is revealing because it exists only if the current hierarchy is too painful. Sandisk describes HBF as a new form of NAND aimed at the AI memory wall, and publicly positions it as delivering far more capacity than HBM while chasing enough bandwidth to matter for AI inference-scale workloads.

What HBF tries to doMove flash upward so it behaves less like backend storage and more like an expanded near-memory capacity tier.

What problem it targetsThe gap between HBM capacity and AI working-set size, especially for long-context and memory-hungry inference.

What it cannot eraseFlash still has very different latency and write behavior than DRAM/HBM, so software policy remains central.

HBF is what happens when NAND realizes it must compete with memory, not just with storage.

Conclusion

AI storage is not an afterthought. It is the persistent half of the memory hierarchy. HDD remains the economic base layer. SSD is the active persistent tier. GPUDirect Storage shortens the path to accelerators. HBF is an attempt to close the widening gap between storage and memory. The bigger the models get, the more storage architecture starts to feel like core AI infrastructure rather than backend plumbing.

Selected references

This primer combines stable systems explanations with current public vendor positioning where relevant.