SRAM Interface Demo

A memory-control-plane prototype for explicit SRAM-style residency.

View on GitHub Read the README

Explicit Memory Residency, Not Just Implicit Caching

Modern systems usually hide memory placement behind caches, coherence, and implicit hardware policy. This prototype explores the opposite: a small software-visible interface for reserving, binding, and verifying SRAM-like residency regions.

Why This Exists

The Problem with Caches

CPUs hide placement through complex hardware heuristics (L1/L2/L3), coherence protocols, and speculation. For general workloads, this is magic; for deterministic AI workloads, it's a bottleneck.

AI Infrastructure Needs

AI workloads like KV cache tiles, tensor tiles, and expert weights benefit from explicit placement to minimize tail latency and maximize throughput.

Software-Defined Residency

This repo demonstrates a tiny software control plane that gives applications direct control over where data lives in the hardware hierarchy.

What the Prototype Includes

Mock SRAM Backend

A portable userspace implementation using calloc. Runs on any OS for rapid architecture testing and CI.

/dev/mem MMIO Backend

A hardware-facing Linux path that maps physical address ranges (FPGA BRAM, PCIe BAR, SoC SRAM) into userspace.

mem_hint API

A clean C API for residency intent: reserve, bind, write, read, and verify.

CLI Demo

A runnable demonstration of the entire flow in mock mode, including AI-centric examples for KV caches and tensors.

Architecture

The stack bridges the gap between high-level AI runtimes and low-level hardware memory apertures.

Demo Flow

A typical interaction involves reserving a region, binding it to a backend, and managing data residency.

Terminal — sram_demo

$ make
$ ./sram_demo

[mem_hint] reserve "kv_tile_0" → SRAM
[mem_hint] bind → offset 0x100
[mem_hint] write → 39 bytes
[mem_hint] readback → verified ✔

Backend Comparison

Backend	Platform	Purpose	Status
Mock SRAM	Any OS	Development/testing	Implemented
/dev/mem MMIO	Linux	Hardware SRAM aperture	Implemented
UIO/VFIO	Linux	Safer hardware mapping	Planned
/dev/mem_hint	Linux Kernel	Explicit residency control	Planned

Use Cases

KV-Cache Tile Residency

Explicitly binding active LLM attention blocks to fast on-chip memory.

Tensor Tile Staging

Overlapping data movement with compute by manually staging tiles in SRAM.

MoE Expert-Weight Promotion

Promoting active expert weights to fast residency before execution.

FPGA Scratchpad Memory

Managing non-coherent FPGA local memory from host software.

Near-Memory Accelerator

Pushing operands into local accelerator memory apertures.

Compiler-Directed Placement

Automated residency hints emitted by AI compilers like XLA or Triton.

Mock Benchmark

A low-level latency benchmark measures the overhead of the memory-control-plane logic in mock mode. This is useful for validating the control path overhead, though it does not reflect physical hardware latency.

Terminal — latency_bench

$ ./latency_bench 1000000

[bench] backend: mock SRAM
[bench] iterations: 1000000
[bench] write32 avg: 4.52 ns/op
[bench] read32 avg: 3.10 ns/op
[bench] verify: OK ✔

Safety & Limitations

⚠️ This is not a production driver. It is a design sketch for a memory control plane.

Raw /dev/mem access is dangerous and should be replaced by UIO/VFIO or a dedicated kernel driver in real systems.

Mock mode is provided for portability and architectural demonstration only. The future /dev/mem_hint interface is conceptual.

Roadmap

Mock SRAM Backend
Fully functional userspace simulation.
/dev/mem Backend
MMIO support for physical hardware apertures.
CI + Examples
GitHub Actions integration and AI-centric demo code.
UIO/VFIO Backend
Moving towards safer hardware mapping interfaces.
/dev/mem_hint Kernel Interface
Conceptual character device for safe residency management.