/dev/mem_hint
A kernel control plane
for AI memory systems.
AI runtimes know exactly what phase they're executing. Memory controllers do not. /dev/mem_hint is the kernel-mediated path for sending that intent down into the memory policy layer — before the hardware is forced to guess, retrain, and pay the latency penalty.
AI software has intent. Memory hardware sees transactions.
Every LLM inference runtime knows, with complete certainty, what phase it is about to enter. It knows when the prefill sweep ends and token-by-token decode begins. It knows when an agentic loop fires off a tool call with its burst-mode access pattern. It knows when the system is idle between requests.
The memory controller knows none of this. It sees a stream of read and write commands — row activations, CAS operations, precharges — and has to infer the best signaling configuration from hardware telemetry alone: measured eye closure, die temperature, ECC error rates. This inference is reactive. By the time the controller adapts, latency has already been paid.
The result is a permanent structural mismatch. A decode step that needs sub-100ns random read latency is served by the same conservative timing margins calibrated for prefill. An idle system between inference requests keeps its PHY voltage swing and PLL at full-power levels, burning watts for nothing.
The fix is simple in principle: give the software a first-class channel to express its intent before the phase transition completes. That is exactly what /dev/mem_hint provides — and what is claimed in Indian Patent Application 202641053160.
Memory adaptation is currently reactive: hardware detects change and catches up. The invention makes it predictive: software announces intent 500µs before the transition, so the hardware is already configured when the first new-phase request arrives. This eliminates >95% of the phase-transition retraining penalty (2.1ms → <0.1ms).
| Layer | Knows | Blind to |
|---|---|---|
vLLM | Phase, token rate, KV cache alloc rate | tRCD, V_swing, PHY config |
| Kernel | PMU events, C-state, process scheduling | Model phase semantics |
| MC firmware | Temperature, ECC, retries | Prefill vs. decode vs. agentic |
| DDR5 PHY | Signal eye, jitter, lane skew | Workload semantics entirely |
/dev/mem_hint bridges this gap. It creates a structured, kernel-validated channel that lets the top layer express what the bottom layer needs to know — without giving user-space unsafe direct access to hardware registers.
One register write. Full semantic intent.
The hint is deliberately small. A phase-transition hint should be cheap enough to emit on every state change in a high-frequency inference loop — adding zero perceptible overhead to the runtime. Eight bytes. One write() syscall. Kernel-side validation is a handful of comparisons. The MSR write is a single instruction.
/* Workload Hint Interface — 64-bit packed structure
As defined in Indian Patent Application 202641053160 */
struct mem_workload_hint {
uint8_t phase_id; /* 0x01=Prefill 0x02=Decode
0x03=Agentic 0x04=Idle
0x05=ForwardPass (training)
0x06=BackwardPass (training) */
uint16_t latency_target_ns; /* target P99 read latency, ns */
uint16_t bw_target_gbps; /* target sustained BW, GB/s */
uint8_t security_level; /* 0=normal 1=confidential
2=TEE-isolated */
uint8_t priority; /* 0–7: policy aggressiveness
vs thermal/ECC budget */
uint8_t reserved[2]; /* future: fabric_id, numa_id */
} __attribute__((packed)); /* total: exactly 8 bytes */
/* Bit layout in encoded 64-bit MSR value:
[63:56] reserved
[55:48] priority
[47:40] security_level
[39:24] bw_target_gbps
[23: 8] latency_target_ns
[ 7: 0] phase_id */
The priority field (0–7) lets the runtime modulate how aggressively the Memory Policy Engine deviates from nominal margins. A priority of 7 allows the largest safe timing reduction. A priority of 0 keeps margins conservative regardless of phase — useful during model loading or when thermal headroom is tight. The policy engine multiplies margin deltas by a function of priority before applying them, subject to the safety limiter clamp.
| ID | Phase | Policy direction |
|---|---|---|
0x05 | ForwardPass | Bandwidth-optimised — identical to Prefill config. Large batch, high sustained BW. |
0x06 | BackwardPass | Reduced tWR (write recovery time) for higher gradient write throughput. GPU firmware agent synchronised via IPC. |
Turning a hint into a privileged hardware event.
User-space processes cannot write MSRs, MMIO-mapped memory-controller registers, or CXL DVSEC capability registers directly. These are privileged interfaces gated by the CPU protection rings. The LKM is the broker: it exposes a safe device file, validates the hint, and dispatches the encoded value to whichever hardware channel is appropriate for the platform.
Device registration
#include <linux/module.h>
#include <linux/fs.h>
#include <linux/uaccess.h>
#include <linux/device.h>
#include <linux/perf_event.h> /* for PMU path */
#include <asm/msr.h>
#define MEM_HINT_MAJOR 240
#define MEM_HINT_NAME "mem_hint"
#define MEM_HINT_MSR 0xC0010F00 /* illustrative vendor MSR */
static struct class *mem_hint_class;
static struct device *mem_hint_dev;
static const struct file_operations mem_hint_fops = {
.owner = THIS_MODULE,
.write = mem_hint_write,
.unlocked_ioctl = mem_hint_ioctl, /* optional ioctl path */
.open = mem_hint_open,
.release = mem_hint_release,
};
static int __init mem_hint_init(void)
{
int ret;
ret = register_chrdev(MEM_HINT_MAJOR, MEM_HINT_NAME, &mem_hint_fops);
if (ret < 0) {
pr_err("mem_hint: chrdev registration failed: %d\n", ret);
return ret;
}
mem_hint_class = class_create(MEM_HINT_NAME);
mem_hint_dev = device_create(mem_hint_class, NULL,
MKDEV(MEM_HINT_MAJOR, 0), NULL, MEM_HINT_NAME);
mem_hint_pmu_init(); /* register PMU overflow callbacks */
mem_hint_sysfs_init(); /* create /sys/bus/platform/... tree */
pr_info("mem_hint: /dev/mem_hint ready\n");
return 0;
}
module_init(mem_hint_init);
module_exit(mem_hint_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Manish KL");
MODULE_DESCRIPTION("AI workload hint interface — patent IN 202641053160");
Write path — copy, validate, dispatch
static bool valid_phase(uint8_t p)
{
/* Accept inference phases 0x01–0x04 and training phases 0x05–0x06 */
return (p >= 0x01 && p <= 0x06);
}
static ssize_t mem_hint_write(
struct file *file,
const char __user *buf,
size_t len,
loff_t *off)
{
struct mem_workload_hint hint;
if (len < sizeof(hint))
return -EINVAL;
if (copy_from_user(&hint, buf, sizeof(hint)))
return -EFAULT;
if (!valid_phase(hint.phase_id))
return -EINVAL;
/* Clamp priority to 3-bit field maximum */
hint.priority = min_t(uint8_t, hint.priority, 7);
/* Security level: validate against active TEE context (claim 18) */
if (hint.security_level > 0 && !tee_context_active())
hint.security_level = 0; /* downgrade if no TEE active */
mem_hint_apply(&hint); /* encode and write hardware channel */
atomic_set(¤t_phase_id, hint.phase_id);
sysfs_notify(&mem_hint_dev->kobj, "status", "current_phase");
return sizeof(hint);
}
Encoding and hardware dispatch
static u64 encode_hint(const struct mem_workload_hint *h)
{
u64 v = 0;
v |= ((u64)h->phase_id) << 0;
v |= ((u64)h->latency_target_ns) << 8;
v |= ((u64)h->bw_target_gbps) << 24;
v |= ((u64)h->security_level) << 40;
v |= ((u64)h->priority) << 48;
return v;
}
static void mem_hint_apply(const struct mem_workload_hint *h)
{
u64 val = encode_hint(h);
switch (platform_channel) {
case CH_MSR:
/* CPU Model-Specific Register path (claim 4) */
wrmsrl(MEM_HINT_MSR, val);
break;
case CH_MMIO:
/* Memory-controller MMIO config space (claim 5) */
iowrite64(val, mc_mmio_base + MEM_HINT_MMIO_OFFSET);
break;
case CH_CXL_DVSEC:
/* CXL Designated Vendor-Specific Extended Capability (claim 6) */
cxl_dvsec_write64(cxl_dev, DVSEC_HINT_REG, val);
break;
}
}
0xC0010F00 is illustrative — a vendor MSR in AMD's architectural space. Real deployment would use a CPU vendor's assigned MSR range coordinated with the processor firmware team. The patent claims the interface contract (structured hint → privileged channel → policy engine), not a specific MSR address.
One phase transition. One hint. Zero DRAM internals required.
Python — vLLM integration
import os, struct
# Phase constants matching the kernel driver
PHASE_PREFILL = 0x01
PHASE_DECODE = 0x02
PHASE_AGENTIC = 0x03
PHASE_IDLE = 0x04
PHASE_FORWARD = 0x05
PHASE_BACKWARD = 0x06
_hint_fd = None
def _open_hint():
global _hint_fd
if _hint_fd is None:
_hint_fd = os.open("/dev/mem_hint", os.O_WRONLY)
def send_mem_hint(phase_id, latency_ns=0,
bw_gbps=0, security=0, priority=7):
# B H H B B 2s = exactly 8 bytes
payload = struct.pack("<BHHBB2s",
phase_id, latency_ns,
bw_gbps, security,
priority, b"\x00\x00")
_open_hint()
os.write(_hint_fd, payload)
# ── vLLM hook — called at phase boundary ──────────────
class MemHintScheduler:
def on_prefill_start(self, batch_size: int):
send_mem_hint(PHASE_PREFILL,
latency_ns=200, # relaxed latency
bw_gbps=400, # maximum BW target
priority=7)
def on_decode_start(self, request_id: str):
send_mem_hint(PHASE_DECODE,
latency_ns=90, # tight P99 target
bw_gbps=150,
priority=7)
def on_tool_call(self):
send_mem_hint(PHASE_AGENTIC,
latency_ns=120,
bw_gbps=200,
priority=5)
def on_idle(self):
send_mem_hint(PHASE_IDLE, priority=3)
C — low-latency integration
#include <fcntl.h>
#include <stdint.h>
#include <unistd.h>
typedef struct __attribute__((packed)) {
uint8_t phase_id;
uint16_t latency_target_ns;
uint16_t bw_target_gbps;
uint8_t security_level;
uint8_t priority;
uint8_t reserved[2];
} mem_hint_t;
static int hint_fd = -1;
static inline void mem_hint_send(
uint8_t phase, uint16_t lat_ns,
uint16_t bw, uint8_t prio)
{
const mem_hint_t h = {
.phase_id = phase,
.latency_target_ns = lat_ns,
.bw_target_gbps = bw,
.security_level = 0,
.priority = prio,
};
if (hint_fd < 0)
hint_fd = open("/dev/mem_hint", O_WRONLY);
write(hint_fd, &h, sizeof(h));
}
/* Emit at phase boundary — zero DRAM internals needed */
mem_hint_send(0x02, 90, 150, 7); /* → decode */
Key design principle: the runtime never needs to know what tRCD is, what DFE tap coefficients mean, or what JEDEC says about voltage swing. It only needs to express what it is about to do. The kernel driver and Memory Policy Engine handle the translation.
Works on unmodified runtimes. Zero application changes.
Most runtimes won't integrate a new kernel API immediately. The PMU path makes the invention useful on day one — before any runtime modifications — by watching hardware performance counters and inferring the workload phase autonomously at 100-microsecond polling intervals.
Every modern CPU exposes uncore PMU events for memory traffic: read/write bandwidth, DRAM command rates, LLC miss rates. The kernel driver registers overflow callbacks on these counters. When a counter crosses its threshold, the PMU fires an interrupt into the kernel, which samples all counters together and applies the classification logic from the patent (§10.1 [0031]).
| Event | Source | Phase signal |
|---|---|---|
UNC_M_CAS_COUNT.WR | IMC uncore | High → Prefill (KV cache fill) |
UNC_M_CAS_COUNT.RD | IMC uncore | High, WR low → Decode (KV reads) |
MEM_LOAD_RETIRED.L3_MISS | Core PMU | Scattered misses → Decode / Agentic |
OFFCORE_REQUESTS.ALL_DATA_RD | Core PMU | BW variance >50% → Agentic |
UNC_M_CMD_RATE | IMC uncore | <1000/s → Idle |
struct pmu_sample {
u64 write_bw_gbps; /* UNC_M_CAS_COUNT.WR */
u64 read_bw_gbps; /* UNC_M_CAS_COUNT.RD */
u64 llc_miss_rate; /* L3_MISS / 100µs */
u64 bw_variance_pct; /* sliding window σ */
u64 dram_cmd_rate; /* commands / second */
};
static uint8_t classify_phase(
const struct pmu_sample *s,
uint8_t prev_phase)
{
/* 1. Prefill: write-dominant, high sustained BW */
if (s->write_bw_gbps > prefill_wr_thresh &&
s->write_bw_gbps > s->read_bw_gbps)
return PHASE_PREFILL;
/* 2. Decode: read-dominant, LLC-miss heavy */
if (s->read_bw_gbps > s->write_bw_gbps * 2 &&
s->write_bw_gbps < decode_wr_ceil &&
s->llc_miss_rate > decode_llc_floor)
return PHASE_DECODE;
/* 3. Agentic: high BW variance (burst/quiet pattern) */
if (s->bw_variance_pct > agentic_variance_thresh)
return PHASE_AGENTIC;
/* 4. Idle: near-zero command rate */
if (s->dram_cmd_rate < idle_cmd_thresh)
return PHASE_IDLE;
return prev_phase; /* hysteresis: hold current */
}
/* PMU overflow NMI handler — fires every 100µs */
static void pmu_overflow_handler(
struct perf_event *event,
struct perf_sample_data *data,
struct pt_regs *regs)
{
struct pmu_sample s = collect_pmu_sample();
uint8_t phase = classify_phase(&s, current_phase);
if (phase != current_phase) {
struct mem_workload_hint h = phase_to_hint(phase);
mem_hint_apply(&h);
current_phase = phase;
}
}
Tune without recompiling the runtime.
The /dev/mem_hint device path is optimised for fast per-transition hints. The sysfs interface is the operational control plane: policy thresholds, per-phase parameter overrides, live observability of the classifier's current state, and ECC/latency telemetry from the feedback loop.
Operator workflows
# Tighten decode margins further
# (valid if ecc_correctable_rate is near zero)
cat /sys/.../status/ecc_correctable_rate
# → 0
echo 16 | sudo tee /sys/.../policy/decode_trcd
# Monitor live phase transitions
watch -n 0.1 cat /sys/.../status/current_phase
# Lower prefill BW trigger for smaller models
echo 6 | sudo tee /sys/.../thresholds/prefill_write_bw_gbps
# Disable idle PLL reduction during benchmarking
echo 0 | sudo tee /sys/.../policy/idle_pll_reduction
# Check measured P99 during decode run
cat /sys/.../status/p99_latency_ns
# → 87 (target was 90ns — within budget)
# Security hardening for TEE workload:
# force nominal timing regardless of phase
echo 22 | sudo tee /sys/.../policy/decode_trcd
The sysfs interface connects directly to the patent's closed-loop feedback loop (§10.4 [0037]): an operator can observe ECC correctable error rate and measured P99 latency from status/, then tighten or relax the policy/decode_trcd knob accordingly — all while inference is running.
Three conduits. One policy engine. Full PHY coverage.
Fast CPU-local write. The kernel executes wrmsrl(MEM_HINT_MSR, val) — a single ring-transition-free privileged instruction. The CPU firmware or an embedded microcontroller on the memory controller die reads the MSR on the next command dispatch interval and routes the encoded hint to the policy engine.
Best for: integrated CPU+MC platforms (server SoCs, ARM Neoverse V-series) where the memory controller is on-die.
The kernel writes to a physical address in the memory-controller configuration space, mapped via ioremap() at driver init. The MMIO write crosses the PCIe or system fabric to the MC ASIC. Suited for discrete memory-controller ASICs where no shared MSR architecture exists.
Best for: HEDT platforms, multi-chip-module designs, and memory controller ASICs from Rambus or Synopsys IP customers.
Hints are written to a Designated Vendor-Specific Extended Capability register in the CXL device configuration space. The CXL device firmware reads the hint and applies PHY configuration independently per HDM (Host-managed Device Memory) region — enabling different signaling for local DDR5 vs. CXL-attached expansion memory simultaneously.
Best for: CXL memory expansion pods, disaggregated memory fabric, heterogeneous memory topologies (claim 15–16).
| Phase | tRCD | V_swing | DFE Tap1 | CTLE | Refresh | Objective |
|---|---|---|---|---|---|---|
| Prefill (0x01) | 22 (nom.) | 300 mV | 0x10 | +2 dB | Nominal | Max sustained BW, link stability |
| Decode (0x02) | 18 (−4) | 280 mV | 0x14 ↑ | +3 dB | Nominal | Min P99 latency, tight margins |
| Agentic (0x03) | 20 (−2) | 290 mV | 0x12 | +2 dB | Nominal | Burst tolerance, balanced |
| Idle (0x04) | 24 (+2) | 240 mV | 0x08 ↓ | 0 dB | Reduced + self-refresh | Min power, PLL quiesce |
Software expresses intent. Hardware enforces the law.
A software-visible memory timing interface is only credible if it cannot be weaponised. If user-space code — or even kernel code — could arbitrarily reduce tRCD or raise V_swing, the interface would be a fault-injection attack surface: a software-controlled analogue of Rowhammer, capable of inducing bit-flips in adjacent DRAM rows, or a thermal throttling vector for an unprivileged process to degrade a co-tenant's memory performance in a cloud environment.
The hardware safety limiter (claims 9–10, §10.4 [0038]) addresses this at the architecture level. It is not firmware — it is immutable combinational logic within the memory controller ASIC, anchored to the platform hardware root of trust. It receives proposed PHY configurations from the Memory Policy Engine and passes them through a clamp function before they reach any hardware write path.
What the safety limiter prevents: A security_level=0 hint with priority=7 requesting tRCD=8 (far below JEDEC minimum) — silently clamped to the SPD EEPROM floor. An idle hint requesting V_swing=180mV (below spec) — clamped to 200mV minimum. Software at any privilege level cannot bypass this path.
struct phy_config {
u8 tRCD, tCL, tRP, tRAS;
u16 vswing_mv;
u8 dfe_tap[4];
s8 ctle_gain_db;
};
struct jedec_limits {
u8 min_tRCD, max_tRCD; /* from SPD EEPROM */
u16 min_vswing, max_vswing;
/* ... read-only after boot ... */
};
static struct phy_config safety_clamp(
struct phy_config proposed,
struct jedec_limits limits,
struct ecc_telemetry ecc)
{
/* Hard clamp to JEDEC bounds — hardware enforced */
proposed.tRCD = clamp(proposed.tRCD,
limits.min_tRCD,
limits.max_tRCD);
proposed.tCL = clamp(proposed.tCL,
limits.min_tCL, limits.max_tCL);
proposed.vswing_mv = clamp(proposed.vswing_mv,
limits.min_vswing, limits.max_vswing);
/* Feedback: relax 1 clk if ECC rate above threshold */
if (ecc.correctable_rate > ECC_WARN_THRESHOLD) {
proposed.tRCD = min(proposed.tRCD + 1, limits.max_tRCD);
proposed.tCL = min(proposed.tCL + 1, limits.max_tCL);
}
/* Security hardening: hold nominal if TEE active (claim 17) */
if (current_security_level > 0) {
proposed.tRCD = max(proposed.tRCD, JEDEC_NOMINAL_tRCD);
proposed.tCL = max(proposed.tCL, JEDEC_NOMINAL_tCL);
}
return proposed; /* safe to apply to hardware */
}
"The hardware safety limiter cannot be overridden, bypassed, or reprogrammed by software executing at any privilege level." — This includes ring-0 kernel code, SMM firmware, and hypervisor code. The limiter sits below all software layers, anchored in silicon.
A new contract between AI software and memory hardware.
| Current systems | /dev/mem_hint style systems |
|---|---|
| Memory tuning is static (boot-time SPD profiles) or reactive (post-degradation retraining). | Memory tuning is predictive: the PHY is reconfigured 500µs before the phase transition completes. |
| Application phase is invisible to the memory controller. Prefill and decode look similar at the PHY level. | Application phase is a structured hardware input, expressed as a 64-bit hint with latency target and priority. |
Linux hints like madvise / mlock affect page placement, not PHY signaling. |
Hints target actual memory signaling: tRCD, V_swing, DFE taps, CTLE — the parameters that determine real latency. |
| 2.1ms phase-transition retraining penalty every time decode follows prefill. | <0.1ms transition overhead. >95% of the penalty eliminated by predictive pre-adjustment. |
| Memory subsystem idles at full power between inference requests. | Idle phase reduces V_swing, triggers DRAM self-refresh, reduces PHY PLL frequency: −39% idle power. |
| Runtimes, OS, firmware, and PHY evolve in completely separate silos. | A shared hint contract creates a coordinated stack: runtime → kernel → firmware → PHY. |
"The present invention introduces, for the first time, a structured cross-layer coordination architecture wherein an AI software runtime directly communicates workload phase semantics to the memory subsystem via a privileged hint interface. Adaptation is predictive, occurring before a phase transition is completed, not reactive to measured degradation."
— Indian Patent Application 202641053160, §6.2 Novelty Summary
Memory needs an intent interface. Now it has one.
The memory wall is real. As models grow and inference systems scale, the constraint is not arithmetic — it is movement. Where data lives, when it moves, how memory links are trained, and whether the hardware can anticipate a phase change instead of discovering it after latency has already been paid.
/dev/mem_hint is a concrete, implementable answer to this problem. It is small enough that a vLLM patch could integrate it in an afternoon. It is low-level enough to connect to real memory policy engines. It is safe enough — through the hardware root-of-trust safety limiter — to deploy in multi-tenant cloud environments without creating a new attack surface.
Three deployment modes mean it works everywhere: on runtimes that have been patched, on runtimes that haven't, and with operator-controlled policy tuning for every environment in between.
The invention is on file. The complete specification is due by April 26, 2027. The ideas are in the open.
Patent Pending Notice
This technology is the subject of Indian Patent Application No. 202641053160, filed as a provisional specification under the Patents Act, 1970 at the Indian Patent Office, Chennai, on 26 April 2026. All rights reserved. Patent Pending.
Bengaluru, Karnataka, India