Home Writings
Systems architecture · virtual memory · storage scale

When 128 TB Stops Feeling Infinite

For years, 48-bit virtual addressing sounded effectively unbounded. Then storage kept scaling, mmap stayed convenient, and modern systems quietly wandered into a new class of limit: not RAM exhaustion, but address-space exhaustion.

By Manish KL Updated draft Topic: x86-64 address space, mmap, storage-dense systems

The headline problem is simple: on many widely deployed x86-64 Linux systems, user space is effectively limited to about 128 TB of virtual address space under the common 4-level paging layout. That felt enormous when disks were smaller and applications were less storage-dense. It feels much smaller once a single process starts mapping dozens of multi-terabyte SSD-backed files.

The key point is easy to miss: mmap consumes virtual address space even when pages are not resident in RAM. So a workload can run out of addressability before it runs out of storage, and before it runs out of physical memory.

The argument in one glance

48 bits
Total canonical VA often cited as 256 TB
47 bits
Typical user space on common x86-64 Linux layouts: 128 TB
15 × 8 TB
120 TB of files already pushes against that user-space ceiling

The misleading comfort of “48 bits is enough”

The familiar statement goes like this: 48 bits gives you 256 TB of addressable space, so what is the problem?

2^48 = 256 TB

That arithmetic is fine. The trouble is that application software usually does not own all 256 TB. On the common x86-64 split used by many Linux systems with 4-level paging, user mode gets roughly half of that canonical space.

2^47 = 128 TB of user virtual address space

So the practical ceiling for a single user-space process is not the full 256 TB figure people casually cite. It is closer to 128 TB, and that ceiling becomes surprisingly reachable in storage-heavy designs.

And in practice, the real limit is often lower. You rarely get one pristine, contiguous 128 TB arena. Address space is already fragmented by shared libraries, stacks, heaps, guard regions, allocator behavior, and layout choices. The slogan number is clean. Reality is diced up.

User space ~128 TB (47 bits) Kernel / remaining canonical half 15 NVMe SSDs × 8 TB each = 120 TB of mapped files almost the whole user VA budget Why the problem arrives sooner than the “256 TB” slogan suggests
The raw 48-bit number is not what a single process typically gets to use. On many systems, the effective user-space ceiling is about 128 TB, and storage arrays can approach that faster than many engineers expect.

Why mmap makes this limit concrete

mmap is seductive because it makes file-backed data feel like memory. That simplicity has real value. But the abstraction comes with a hidden cost: every mapping reserves a portion of the process’s virtual address space.

Importantly, this is true even when the pages are not yet faulted into DRAM. The address range still exists. The mapping still occupies virtual address real estate. So a system can hit a virtual-address ceiling while still having plenty of free physical memory and plenty of disk capacity left.

We are no longer running out of RAM; we are running out of places to point to.

The failure mode is quieter than people expect

The failure is not dramatic. There is no cinematic collapse where the machine obviously “runs out of memory.” Instead, a large mapping simply fails. The process gets ENOMEM from mmap, even while RAM is mostly free and disks are nowhere near full. That is the moment people realize they did not run out of memory. They ran out of addressability.

void* p = mmap(NULL, huge_len, PROT_READ, MAP_SHARED, fd, 0);
if (p == MAP_FAILED) {
    perror("mmap");   // often: ENOMEM
}

And for very large mappings, contiguity matters. Even with enough aggregate free virtual space, fragmentation can make a specific mmap request impossible to satisfy as one clean region.

A simple thought experiment

Assume a storage-heavy analytics or vector-search machine with fifteen 8 TB SSDs. That is 120 TB of file-backed capacity. If a design leans on one large process and maps most or all of those files, it is suddenly operating right next to the 128 TB user-space boundary.

That is the key inversion. Storage scale is beginning to collide with virtual address scale.

Why this stayed quiet for so long

For a long time, the industry could treat virtual address space as effectively infinite because other bottlenecks arrived first. Disks were smaller. Datasets were more fragmented. Applications naturally distributed data across machines or across processes before one process could plausibly map tens of terabytes.

That world changed. Multi-terabyte NVMe drives are now normal. Memory-mapped databases are common. Retrieval systems and vector stores often prefer large, file-backed indexes. AI systems increasingly manage corpora and artifacts that are too big to think of as a small set of files loaded in the old way.

Old assumption
Why it worked
Why it is cracking now
“Address space is effectively infinite.”
Disks were smaller and per-process mappings stayed modest.
High-density NVMe boxes and mmap-heavy designs can now approach the limit.
“If it is on disk, I can just map it.”
File-backed convenience usually dominated any hidden cost.
The hidden cost is no longer hidden at 100+ TB scale.
“Memory abstraction is flat enough.”
DRAM-centric systems hid tier boundaries.
HBM, DRAM, CXL, and SSD expose radically different access economics.

What kinds of systems feel this first

Not every machine will care. Most laptops and ordinary application servers will never come close. The pressure appears first in systems that combine all of the following:

That makes the problem especially relevant to databases, vector indexes, search systems, analytics engines, and AI-adjacent retrieval pipelines.

This is one reason parts of the disk-ANN and vector-store world have been moving toward custom buffer managers and more explicit control instead of pure “just mmap it” designs. Once the working set gets enormous, convenience stops being free.

How the industry solves it in practice

There is no single fix. Instead, the response is happening at three different layers: hardware/platform evolution, software workarounds, and architectural change.

1. Bigger virtual address spaces

The most straightforward answer is to expand the address-space ceiling itself. On x86, that means 5-level paging, often referred to by the configuration symbol CONFIG_X86_5LEVEL, enabling LA57 on supported systems.

2^57 = 128 PB of virtual address space

This does not magically solve every systems problem, but it moves the cliff much farther away. For certain classes of storage-dense machines, that alone is a meaningful relief.

One tradeoff: 5-level paging adds another level to the page-table walk. In the common case, TLBs hide most of that cost. But on a TLB miss, the walk can be slightly more expensive. More bits buy headroom, not free lunch.

2. Stop mapping everything at once

Many production systems avoid the cliff by abandoning the “map the world” instinct. Instead, they use sliding windows, chunked mappings, or explicit I/O. The software becomes more complicated, but the address-space demand becomes more manageable.

3. Move toward explicit data orchestration

This is the deeper shift. The old model says: map data, let page faults discover access patterns reactively, and let the OS fill in the rest. The new model says: data movement needs to be intentional, tier-aware, and increasingly explicit.

That matters well beyond SSD-backed files. The same pressure appears once systems span HBM, DRAM, CXL-attached memory, local NVMe, and remote storage. Not all tiers should look like equally addressable memory, and not all movement should be reactive.

Two design philosophies A. “Map everything” model Application mmap VA space SSD Simple abstraction, but every mapping burns VA space. B. Orchestrated movement model Runtime / app Data orchestrator HBM DRAM CXL SSD Not every byte must be addressable at once; placement and movement become explicit.
The architectural shift is larger than just adding more bits. Systems are moving from reactive, page-fault-driven discovery toward explicit, tier-aware movement and placement.

CXL makes the story more interesting, not simpler

CXL-attached memory expands the physical capacity of a node. That sounds like it should make everything better, and in many ways it does. But it also increases the temptation to build even larger “map everything” systems. In other words, CXL can indirectly accelerate address-space pressure by encouraging single-node designs whose logical working universe keeps expanding faster than old assumptions about virtual space.

That is why CXL belongs in this discussion. It is not just another tier. It is one more reason the industry must separate physical capacity from what should be continuously and uniformly addressable in one process.

Why this matters beyond one Linux configuration

It is easy to read this as a niche complaint about a particular paging mode. That would miss the more important lesson. The lesson is that the old “storage can just pretend to be memory” story gets weaker as systems become more heterogeneous and more storage dense.

Once a machine spans multiple very different tiers, addressability and residency stop meaning the same thing. A byte can exist in a system without it making sense to keep it permanently represented in one giant, flat, process-visible virtual address layout.

The deeper takeaway: the abstraction that breaks first is not necessarily memory capacity. It is the assumption that one process can or should present the entire working universe as a single, continuously addressable map.

From the “why” to the “how”

This is where the conversation gets broader than mmap. If address space stops being the right control plane, then data movement has to become explicit. The system needs a richer notion of intent: what data matters now, what must stay resident, what can be windowed, what can be prefetched, and what belongs in which tier.

That is exactly why ideas like compiler- or runtime-emitted memory intent become interesting. The old model relies on faults and generic heuristics to infer what should happen. The emerging model is more declarative: the software stack says what phase it is in, what data will be hot, and what residency guarantees matter. This blog is the “why.” Memory-intent-style architectures are one candidate “how.”

So is this a serious problem?

For ordinary applications, no. For storage-dense, mmap-heavy, single-process designs, yes. That distinction matters. This is not a universal emergency. It is a real systems constraint that becomes very sharp in exactly the class of machines now being built for data-intensive workloads.

And that is why the issue is worth paying attention to. It reveals an emerging boundary in the old memory model. Bigger address spaces will buy time. Better runtime and operating-system design will reduce the immediate pressure. But the long-term answer is architectural: explicit orchestration of where data lives, when it moves, and which tiers are truly part of the active addressable working set.

A small symbol with a big message: CONFIG_X86_5LEVEL

There is something quietly profound about a configuration symbol like CONFIG_X86_5LEVEL. It looks mundane, but it marks the moment a limit once treated as academic becomes operationally relevant. It is the industry acknowledging that scale moved, and the memory model has to move with it.

The bigger story is not just “thank you for more bits.” It is this: we are entering an era where address space itself is a first-class systems design concern again.

We went from “640 KB should be enough for anyone” to “128 TB is tighter than expected” — not because machines got smaller, but because our abstractions stayed the same while scale did not.