The short version
There are two different conversations hidden inside the phrase “flash memory in servers.” The first conversation is about platform firmware, where NOR dominates. The second is about persistent data storage, where NAND dominates. AI servers need both, but for completely different reasons.
NVMe.
What is NOR flash?
NOR flash is non-volatile memory optimized for reliable random reads and direct addressing. It behaves much more like a read-mostly memory device than like a traditional disk. In many systems, the processor or service controller can map it directly and fetch instructions from it.
Why engineers choose NOR
NOR is attractive when the payload is relatively small, must be trustworthy, and needs predictable access. Firmware images fit this pattern almost perfectly.
What it is bad at
NOR is not economical for high-capacity storage. It has lower density and a much higher cost per bit than NAND, so it does not make sense as the bulk storage medium for AI data.
- Fast random reads
- Read-mostly usage pattern
- Often suitable for execute-in-place firmware paths
- Lower density and higher cost per bit
- Usually stores firmware, recovery slots, and secure boot anchors
Where NOR is used in AI servers
In a real AI server, NOR usually stores the motherboard firmware, the BMC image, and firmware for supporting devices such as NICs, DPUs, retimers, or small controller subsystems. This is why NOR matters even in a machine whose main business is moving tensors: without NOR, the box does not boot, recover, or manage itself cleanly.
What is NAND flash?
NAND flash is non-volatile memory optimized for density and capacity. It is designed around page and erase-block operations rather than arbitrary byte writes. That makes it a far better medium for SSDs than for directly executable firmware.
NAND is the technology that turned flash from a firmware medium into a storage tier.
- High density and lower cost per bit
- Ideal for enterprise SSD capacity scaling
- Requires ECC, wear management, and bad-block handling
- Usually accessed through a controller rather than directly by applications
- Dominant persistent local storage medium in AI servers
Where NAND is used in AI servers
In AI systems, NAND usually lives inside NVMe SSDs. Those drives hold dataset shards, checkpoints, container layers, local model caches, vector indexes, telemetry, and OS images. When an AI team says “the flash tier,” they almost always mean NAND-backed SSDs.
2D vs 3D NAND
2D NAND, also called planar NAND, arranges cells on the surface of the silicon. 3D NAND stacks cells vertically across many layers. The shift from 2D to 3D is one of the biggest reasons modern enterprise SSDs reached the capacities AI infrastructure now expects.
Why 2D NAND ran into trouble
As planar NAND cells shrank, manufacturers ran into higher interference, more leakage, tighter process margins, and worse scaling economics. That made continued 2D shrink harder and less rewarding.
Why 3D NAND changed the economics
3D NAND improved density by adding vertical layers instead of only shrinking lateral dimensions. That pushed capacity up and cost per bit down enough to make very large SSDs practical for data-center and AI deployments.
What about 4D NAND?
4D NAND is not usually treated as a completely separate textbook category in the way 2D and 3D NAND are. In practice, it is best understood as an advanced form of 3D NAND combined with architectural changes around how peripheral circuitry, charge-trap structures, and array organization are laid out. Some vendors use the term to highlight that the design is not only vertically stacked, but also optimized in the surrounding cell architecture and control layout.
The important practical takeaway is that 4D NAND is usually still part of the broader 3D NAND era, not a clean break like planar-to-vertical NAND was. For AI servers, that means it matters mainly through the same outcomes engineers already care about: density, endurance behavior, controller efficiency, cost per bit, and enterprise SSD capacity scaling.
| Property | 2D NAND | 3D NAND |
|---|---|---|
| Cell layout | Planar, mostly horizontal | Vertically stacked layers |
| Scaling path | Shrink lateral geometry | Add layers and optimize stack |
| Density potential | More limited at advanced nodes | Far better for large SSD capacities |
| Role in modern AI servers | Mainly historical context or older generations | Dominant enterprise SSD media |
Where flash sits in AI servers
An AI server has multiple storage roles, and different flash technologies map to them differently. The boot path wants reliability and deterministic reads. The data path wants capacity, economics, and fast local persistence.
Boot and recovery plane
NOR is the anchor here. It stores the firmware that lets the board come alive, initializes management processors, and gives the machine a recovery story when the main OS is unavailable.
Main storage plane
NAND, typically inside NVMe SSDs, stores the large persistent state around AI jobs: datasets, local replicas, checkpoints, container images, logs, and warm caches used for rapid restarts or model rollouts.
Management and service plane
Service controllers such as the BMC often have their own NOR-backed firmware. That separation matters because out-of-band management must survive host crashes and remain independently bootable.
How AI workloads use flash
Training
During training, NAND-backed SSDs commonly hold dataset shards, preprocessing outputs, checkpoints, experiment logs, and local scratch space. The hot tensors themselves live in GPU memory or host DRAM, but NAND keeps the job fed and persistent.
Inference
During inference, NAND-backed storage commonly holds local model caches, deployment artifacts, vector indexes, embeddings, and sometimes spill or staging data. It helps fleets warm quickly and keeps model rollout latency under control.
Operations
Both flash types matter for operations. NOR preserves bootability and rollback paths. NAND preserves the evidence and artifacts that let operators restore services, reprovision nodes, and inspect failures.
Raw flash vs SSD-managed flash
This distinction is the one most readers miss. Linux can encounter flash in two completely different forms.
Raw flash
Raw flash means Linux is dealing directly with the NOR or NAND device. In that world the OS must respect erase-before-write behavior, bad blocks, ECC constraints, and geometry details. This is common for motherboard firmware devices, embedded controllers, or BMC storage.
SSD-managed flash
In an NVMe SSD, the controller firmware hides the physical NAND complexity. Linux sees a logical block device or namespace. The SSD controller internally manages the flash translation layer, ECC, bad-block remapping, garbage collection, and wear leveling.
NVMe device whose controller happens to be backed by NAND, usually modern 3D NAND.
Linux kernel and driver support
The software stack changes depending on whether the flash is raw or controller-managed.
Where these drivers live in the kernel structure
At a high level, Linux splits this problem by abstraction boundary. Raw flash sits in the memory-technology world. SSD-backed flash sits in the block-storage world. That split is why MTD and NVMe feel so different even when the underlying persistent medium is ultimately “flash.”
drivers/mtd/holds the flash-oriented subsystem for raw memory technology devices.drivers/mtd/spi-nor/is the common home for SPI NOR support.drivers/mtd/nand/raw/contains raw NAND support.drivers/mtd/nand/spi/contains SPI NAND support in kernels that split it that way.drivers/spi/contains SPI controller drivers that sit underneath SPI NOR and some SPI NAND paths.drivers/nvme/host/contains the NVMe host-side driver stack used for SSDs on PCIe.block/contains the generic block layer that filesystems and storage drivers use to exchange I/O requests.fs/contains the filesystems that sit on top of block storage or, in some raw-flash cases, flash-aware layers such asUBIFS.
NOR flash on Linux
Raw NOR is usually supported through the MTD subsystem. A common path is MTD plus spi-nor on top of an SPI controller driver. Linux probes the chip, exposes MTD partitions, and supports read, erase, and program operations along with vendor-specific quirks and protections.
MTD path and a bus-specific flash driver such as spi-nor.MTDcore for memory technology devicesspi-norfor SPI NOR chipsspi-memand SPI controller drivers underneath- Partitioning and firmware update workflows in userspace
Raw NAND on Linux
Raw NAND is more complicated. Linux still often uses the MTD stack, but adds raw NAND support, SPI NAND support in some designs, ECC machinery, and layers such as UBI and UBIFS for wear-aware volume management.
MTDcorerawnandcorespinandfor SPI NAND devices- NAND controller drivers
UBIandUBIFSin raw NAND deployments
This path is much more common in embedded systems and service controllers than in the host-side storage path of an AI server.
NAND inside SSDs on Linux
When NAND is inside an SSD, Linux usually talks to NVMe, not to NAND directly. The kernel stack is the PCIe subsystem, the nvme host driver, the block layer, and then the filesystem. The SSD firmware is what absorbs the flash complexity below that.
NVMe to a controller-managed SSD backed by NAND media.| Flash case | Typical Linux subsystem | Common drivers or layers | Common AI-server role |
|---|---|---|---|
| Raw NOR | MTD | spi-nor, SPI controller drivers | UEFI, BIOS, BMC, firmware images |
| Raw NAND | MTD | rawnand, spinand, UBI, UBIFS | Embedded controllers, service devices |
| NAND in SSDs | block + NVMe | nvme, PCIe, filesystem | Datasets, checkpoints, logs, model cache |
Kernel init and probe flow
The simplest useful way to understand Linux flash support is to follow the probe path. The exact source files differ by platform, but the high-level bring-up sequence is consistent: the kernel discovers a bus or controller, binds the matching driver, identifies the flash device, allocates subsystem structures, and finally exposes an interface that upper layers can use.
Basic NOR initialization flow
- The platform firmware or device tree describes an SPI controller and a child flash device.
- The SPI controller driver registers itself with the kernel and initializes the bus hardware.
- The SPI core matches the flash child node to the
spi-norpath. spi-norreads JEDEC identification data and determines chip geometry and capabilities.- The MTD layer allocates an
mtd_info-style device representation and wires in read, erase, and program callbacks. - Partitions may then be registered, after which userspace sees MTD devices it can read or update.
In plain English, the SPI controller comes up first, the flash driver learns what chip is attached, and the MTD subsystem becomes the public interface that the rest of Linux uses.
Basic raw NAND initialization flow
- A NAND controller driver initializes its hardware registers, DMA setup, and timing hooks.
- The raw NAND core or SPI NAND layer identifies the device and learns page size, erase-block size, and bad-block rules.
- The driver configures ECC requirements, either through a dedicated ECC engine or controller-integrated logic.
- The MTD core registers the NAND device as a flash-aware object rather than as a normal disk.
- Optional upper layers such as
UBIandUBIFSare attached if the platform uses them.
This is why raw NAND drivers feel more complex than NOR drivers: they are not just discovering capacity, they are also establishing the correctness rules needed to survive bad blocks and bit errors.
Basic NVMe SSD initialization flow
- PCIe enumeration discovers the SSD as a PCIe function.
- The kernel matches the device to the
nvmehost driver. - The NVMe driver maps controller registers, negotiates queue resources, and sets up admin queues first.
- The driver identifies the controller, then enumerates namespaces that represent logical storage regions.
- The block layer registers those namespaces as block devices.
- Filesystems such as
XFSorext4mount on top, and applications finally see ordinary storage.
The important difference from raw flash is that Linux never needs to understand the internal NAND geometry. The SSD firmware has already converted the flash medium into a logical namespace abstraction.
Why init routines matter in production
These initialization steps are not just academic. If the wrong SPI flash description is shipped, the NOR path may boot unreliably. If ECC settings are mismatched, a raw NAND design may silently degrade. If PCIe or NVMe initialization is unstable, SSD namespaces may vanish or reset under load. In production AI servers, a large share of “storage weirdness” actually begins in probe, identification, or controller bring-up rather than in steady-state reads and writes.
Kernel code samples
Here are small illustrative code samples that show the shape of what these drivers and subsystems do. These are intentionally simplified, but they mirror the responsibilities you see in Linux: probe the device, identify capabilities, register with the right subsystem, and expose a stable interface to the rest of the system.
SPI NOR probe for platform firmware flash
static int ai_spi_nor_probe(struct spi_device *spi)
{
struct spi_nor *nor;
int ret;
nor = devm_kzalloc(&spi->dev, sizeof(*nor), GFP_KERNEL);
if (!nor)
return -ENOMEM;
nor->dev = &spi->dev;
spi_nor_set_flash_node(nor, spi->dev.of_node);
ret = spi_nor_scan(nor, NULL, NULL);
if (ret)
return ret;
ret = mtd_device_register(&nor->mtd, NULL, 0);
if (ret)
return ret;
dev_info(&spi->dev, "NOR flash ready for firmware slots\n");
return 0;
}
Raw NAND bring-up with ECC setup
static int ai_nand_probe(struct platform_device *pdev)
{
struct nand_chip *chip;
struct mtd_info *mtd;
int ret;
chip = devm_kzalloc(&pdev->dev, sizeof(*chip), GFP_KERNEL);
if (!chip)
return -ENOMEM;
chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_ON_HOST;
chip->ecc.size = 1024;
chip->ecc.strength = 8;
ret = nand_scan(chip, 1);
if (ret)
return ret;
mtd = nand_to_mtd(chip);
return mtd_device_register(mtd, NULL, 0);
}
NVMe controller initialization for NAND-backed SSDs
static int ai_nvme_probe(struct pci_dev *pdev,
const struct pci_device_id *id)
{
struct nvme_ctrl *ctrl;
int ret;
ret = pcim_enable_device(pdev);
if (ret)
return ret;
ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
if (ret)
return ret;
ctrl = nvme_init_ctrl(&pdev->dev);
if (IS_ERR(ctrl))
return PTR_ERR(ctrl);
ret = nvme_alloc_admin_tags(ctrl);
if (ret)
return ret;
ret = nvme_start_ctrl(ctrl);
if (ret)
return ret;
return nvme_scan_ns_list(ctrl);
}
Block queue registration for storage traffic
static int ai_nvme_alloc_io_queues(struct nvme_ctrl *ctrl)
{
int qid;
for (qid = 1; qid <= ctrl->queue_count; qid++) {
struct blk_mq_tag_set *set = &ctrl->tagset[qid - 1];
set->ops = &ai_nvme_mq_ops;
set->nr_hw_queues = 1;
set->queue_depth = ctrl->q_depth;
set->numa_node = dev_to_node(ctrl->dev);
if (blk_mq_alloc_tag_set(set))
return -ENOMEM;
}
return 0;
}
Simple MTD firmware-slot write path
static int ai_firmware_slot_write(struct mtd_info *mtd,
loff_t off,
const u8 *image,
size_t len)
{
size_t written = 0;
int ret;
if (len > mtd->size)
return -EINVAL;
ret = mtd_erase(mtd, &(struct erase_info) {
.addr = 0,
.len = mtd->size,
});
if (ret)
return ret;
ret = mtd_write(mtd, off, len, &written, image);
if (ret || written != len)
return -EIO;
return 0;
}
The hidden vendor stack
The most under-discussed part of flash in AI servers is that “flash” is not one vendor decision. It is a stack of vendor decisions layered on top of one another. The NAND manufacturer, the SSD controller vendor, the server OEM, the BMC silicon vendor, the BIOS supplier, and the accelerator platform integrator all influence the behavior that operators eventually experience as “storage reliability” or “boot stability.”
1. The firmware-island problem
A modern AI server is full of tiny flash-backed control domains. The motherboard boot ROM may be on one NOR device. The BMC may boot from another. NICs, DPUs, retimers, storage controllers, and even SSDs themselves can each carry independent firmware images and update lifecycles. That means fleet reliability can fail at the seams between these firmware islands rather than at the main host OS.
2. The SSD controller is the real storage personality
People often compare NAND types as if the media alone defines behavior. In practice, enterprise SSD behavior is dominated by controller firmware policy: the flash translation layer, garbage collection strategy, overprovisioning choices, thermal policy, queue handling, recovery behavior, and power-fail logic. In other words, the SSD controller firmware is often the hidden operating system underneath AI storage.
That is why two drives with nominally similar 3D NAND can behave very differently under checkpoint bursts, mixed read-write pressure, or long-lived cache workloads. The media matters, but the controller policy often matters more.
3. The vendor map that actually matters
The most useful way to think about vendors is by layer, not by a flat list of brand names.
- NAND manufacturers: vendors such as
Samsung,Kioxia,Micron,SK hynix,Solidigm, andWestern Digital / SanDiskshape density, endurance classes, and enterprise SSD supply. - SSD controller vendors: vendors such as
Phison,Marvell, andSilicon Motion, along with in-house controller teams at major SSD vendors, shape the actual runtime behavior the host sees. - Platform firmware vendors: suppliers such as
AMI,Phoenix, andInsydeinfluence BIOS and UEFI behavior, whileASPEEDis a major force in BMC silicon and out-of-band management. - System OEMs and ODMs: vendors such as
Dell,HPE,Lenovo,Supermicro,Gigabyte,Quanta, andWiwynndecide NVMe topology, cooling, serviceability, and firmware integration quality. - AI platform integrators: ecosystems built around
NVIDIA,AMD,Intel, and hyperscaler reference designs influence how much local flash exists per node and which workloads that flash tier must absorb.
4. AI creates a strange endurance profile
AI workloads do not look like traditional enterprise storage mixes. Checkpoints create huge sequential write bursts. Model rollout and local cache warming can create heavy read waves. Telemetry and logging create smaller steady writes. Some systems add intermediate spill or artifact churn on top. The result is a wear profile that can be surprisingly asymmetric and much harsher than generic enterprise assumptions.
This is why write amplification, spare area, drive write per day ratings, and controller garbage-collection policy deserve much more attention in AI storage design than they usually get in high-level infrastructure discussions.
5. Thermals and throttling are often misdiagnosed as software issues
AI operators often think in terms of GPU thermals and rack power, but local flash thermals matter too. Enterprise SSDs can throttle under sustained heavy writes, sustained random I/O, or dense front-bay thermal conditions. When they do, model load times stretch, checkpoint latency grows, and the symptoms often look like a software regression rather than a storage thermal event.
6. Power-loss protection is not a boring enterprise checkbox
Power-loss protection on enterprise SSDs matters a lot more in AI than people usually admit. When a node is checkpointing or writing deployment artifacts, a sudden power event without robust protection can corrupt in-flight metadata or partially committed writes. In large fleets, those edge cases become inevitabilities rather than hypotheticals.
This is one of the reasons consumer SSD assumptions map badly onto AI infrastructure. The cost of a corrupted checkpoint, a damaged local cache, or a recovery-path failure is far larger than the sticker price difference between a consumer-style drive and a real enterprise one.
7. NOR is tiny, but existential
NOR rarely gets attention because it is small in capacity and invisible during normal operation. But from a fleet-operations perspective, NOR is existential. If that tiny device holds the wrong firmware, the node may fail to boot, fail to recover, or fail to rejoin the cluster after maintenance. NAND shortages hurt capacity planning; NOR mistakes hurt bootability.
The smallest flash device in the server may be the one with the highest operational leverage.
Practical takeaways
If you are a platform engineer, you care about NOR because it defines boot safety, rollback, recovery, and secure-update behavior. If you are a storage or kernel engineer, you care about whether the flash is raw or controller-managed, because that determines whether you work in the MTD world or the NVMe world. If you are an AI systems engineer, you mostly care about NAND-backed SSD behavior because that is the tier that feeds checkpoints, local caches, and deployment artifacts.
NOR flashis the read-mostly firmware medium for boot and service processors.NAND flashis the density-first storage medium that usually appears as SSD capacity.2D NANDis the older planar form;3D NANDis the stacked form that made modern high-capacity enterprise SSDs viable.- Linux usually supports raw flash through
MTD, but supports SSD-backed NAND throughNVMeand the block layer.