SRAM Interface Demo

A memory-control-plane prototype for explicit SRAM-style residency.

Explicit Memory Residency, Not Just Implicit Caching

Modern systems usually hide memory placement behind caches, coherence, and implicit hardware policy. This prototype explores the opposite: a small software-visible interface for reserving, binding, and verifying SRAM-like residency regions.

Why This Exists

The Problem with Caches

CPUs hide placement through complex hardware heuristics (L1/L2/L3), coherence protocols, and speculation. For general workloads, this is magic; for deterministic AI workloads, it's a bottleneck.

AI Infrastructure Needs

AI workloads like KV cache tiles, tensor tiles, and expert weights benefit from explicit placement to minimize tail latency and maximize throughput.

Software-Defined Residency

This repo demonstrates a tiny software control plane that gives applications direct control over where data lives in the hardware hierarchy.

What the Prototype Includes

Mock SRAM Backend

A portable userspace implementation using calloc. Runs on any OS for rapid architecture testing and CI.

/dev/mem MMIO Backend

A hardware-facing Linux path that maps physical address ranges (FPGA BRAM, PCIe BAR, SoC SRAM) into userspace.

mem_hint API

A clean C API for residency intent: reserve, bind, write, read, and verify.

CLI Demo

A runnable demonstration of the entire flow in mock mode, including AI-centric examples for KV caches and tensors.

Architecture

The stack bridges the gap between high-level AI runtimes and low-level hardware memory apertures.

Architecture Diagram

Demo Flow

A typical interaction involves reserving a region, binding it to a backend, and managing data residency.

Demo Flow Diagram
Terminal — sram_demo
$ make
$ ./sram_demo

[mem_hint] reserve "kv_tile_0" → SRAM
[mem_hint] bind → offset 0x100
[mem_hint] write → 39 bytes
[mem_hint] readback → verified ✔

Backend Comparison

Backend Platform Purpose Status
Mock SRAM Any OS Development/testing Implemented
/dev/mem MMIO Linux Hardware SRAM aperture Implemented
UIO/VFIO Linux Safer hardware mapping Planned
/dev/mem_hint Linux Kernel Explicit residency control Planned

Use Cases

KV-Cache Tile Residency

Explicitly binding active LLM attention blocks to fast on-chip memory.

Tensor Tile Staging

Overlapping data movement with compute by manually staging tiles in SRAM.

MoE Expert-Weight Promotion

Promoting active expert weights to fast residency before execution.

FPGA Scratchpad Memory

Managing non-coherent FPGA local memory from host software.

Near-Memory Accelerator

Pushing operands into local accelerator memory apertures.

Compiler-Directed Placement

Automated residency hints emitted by AI compilers like XLA or Triton.

Mock Benchmark

A low-level latency benchmark measures the overhead of the memory-control-plane logic in mock mode. This is useful for validating the control path overhead, though it does not reflect physical hardware latency.

Terminal — latency_bench
$ ./latency_bench 1000000

[bench] backend: mock SRAM
[bench] iterations: 1000000
[bench] write32 avg: 4.52 ns/op
[bench] read32 avg: 3.10 ns/op
[bench] verify: OK ✔

Safety & Limitations

⚠️ This is not a production driver. It is a design sketch for a memory control plane.

Raw /dev/mem access is dangerous and should be replaced by UIO/VFIO or a dedicated kernel driver in real systems.

Mock mode is provided for portability and architectural demonstration only. The future /dev/mem_hint interface is conceptual.

Roadmap