Original Research   April 26, 2026

Teaching Computers
to Remember
Smarter

How I filed a patent on a system that lets AI software directly tell memory hardware what it needs — before it needs it — achieving 20% lower latency, 39% less idle power, and eliminating 95% of transition overhead.

Application No. 202641053160
Filed 26 April 2026
Office Indian Patent Office, Chennai
Status ● Provisional Filed
Claims 40  |  5 Figures
Read the full story

AI hit a wall. Not a compute wall — a memory wall.

Everyone talks about GPUs. Billions of dollars poured into compute. But here is the quiet truth that anyone who has run LLM inference at scale already knows: the bottleneck is memory. Not how fast you can multiply matrices — how fast you can feed those matrices data.

When a large language model like LLaMA-3 70B is generating text, it needs to read its entire KV cache — potentially hundreds of gigabytes — on every single token step. That is a random-access memory problem, not a compute problem. And the memory system has no idea this is happening.

"The memory controller is a black box. It responds to requests. It has no idea if those requests are latency-critical or bandwidth-hungry."

Today's DRAM systems operate on static timing margins set at boot from an SPD EEPROM chip. Those margins are conservative — designed for worst-case conditions across every workload that might ever run. A prefill sweep that needs maximum sustained bandwidth gets the same timing as a decode step that needs minimum latency. An idle system between inference requests keeps all its power circuits fully energized, burning watts for nothing.

I spent time deeply studying the Rambus and ARM memory architecture ecosystems, and I kept coming back to the same question: why doesn't the software tell the hardware what it's about to do?

Every LLM runtime knows exactly what phase it's in. It knows when prefill ends and decode begins. It knows when the agent is in a planning loop. Why isn't that information flowing down to the memory controller?

That gap is the invention.

Decode Latency
20%
reduction in P99 random-read latency for LLaMA-3 70B decode phase
Idle Power
39%
lower memory subsystem power during idle phases (2-socket, 16-DIMM system)
Transition Overhead
>95%
elimination of phase-transition retraining penalty (2.1ms → <0.1ms)
Prefill Bandwidth
6.5%
increase in sustained memory bandwidth during prefill across 8 DDR5-6400 channels
Memory system parameters adapted
tRCD tCL tRP tRAS V_swing DFE Tap 1–4 CTLE gain Per-lane skew Refresh rate PHY PLL freq

A cross-layer bridge between AI software and memory physics

The patent — formally titled "System and Method for Software-Defined, Workload-Aware Adaptive Memory Signaling and Timing Control in Artificial Intelligence Computing Systems" — describes a four-component architecture that lets AI runtimes communicate workload phase information directly to memory hardware, which then pre-adjusts its signaling parameters before the next phase begins.

Fig. 1 — System Architecture
SW FW HW LLM Inference Engine Token rate · KV cache monitor Agentic AI Scheduler Loop detect · syscall pattern OS / Power Manager C-state · thermal governor Workload Hint Interface MSR · MMIO · CXL DVSEC — 64-bit {phase_id, latency_ns, bw_gbps, security_level, priority} HW Safety Limiter immutable RoT Memory Policy Engine Phase→Config lookup · Predictive scheduler · Feedback integrator CPU firmware / MC ASIC / dedicated HW block Memory Controller + PHY Subsystem tRCD/tCL/tRP/tRAS · V_swing · DFE Tap 1-4 · CTLE · Per-lane skew · Refresh DDR5 DIMM / MRDIMM (MRCD+MDB) / HBM2e-HBM3 / CXL.mem ECC correctable / uncorrectable events Read retry / CMD retry count Measured access latency (ns) Thermal / CXL congestion telemetry

Component 1: Runtime Workload Classifier

Running inside the AI software stack — or as an OS kernel module — this component continuously samples performance counters at 100-microsecond intervals. It watches token emission rate, KV cache allocation bandwidth, system call patterns, and CPU C-state transitions. From these signals, it classifies the current execution into one of six phases.

Component 2: Workload Hint Interface

A 64-bit structured register write — transmitted via CPU Model-Specific Register, MMIO, or CXL DVSEC sideband — carries a compact hint from the software classifier to the firmware layer. The hint contains a phase identifier, latency target, bandwidth target, security level, and priority. This interface is the key novelty: it is the first time an AI runtime can express its memory semantics in hardware-readable form.

Component 3: Memory Policy Engine

A firmware component (in the CPU or memory controller ASIC) that holds a policy lookup table mapping phase identifiers to complete signaling configurations. It predicts upcoming transitions and schedules pre-adjustments 500 microseconds before a phase change — proactively, not reactively. It also runs a closed-loop feedback controller using ECC error rates and measured latency to continuously refine margins.

Component 4: Memory Interface Subsystem

The hardware layer — DDR5 controller, PHY circuitry, DRAM devices — that receives and applies the configurations. Every configuration is validated by an immutable hardware safety limiter anchored in the platform root of trust, ensuring no software can push the memory outside JEDEC-compliant bounds regardless of what any hint says.

/* Workload Hint Interface 64-bit packed structure */ struct mem_workload_hint { uint8_t phase_id; /* 0x01 = Prefill 0x02 = Decode 0x03 = Agentic 0x04 = Idle 0x05 = ForwardPass 0x06 = BackwardPass */ uint16_t latency_target_ns; uint16_t bw_target_gbps; uint8_t security_level; uint8_t priority; /* 0–7 */ uint8_t reserved[2]; } __attribute__((packed)); /* Total: 8 bytes = 1 register write */
Transmission channels
MSR CPU Model-Specific Register 0xC001_XXXX
MMIO Memory controller config space
CXL DVSEC CXL device sideband register

Every phase of AI execution has a different memory personality

The core insight is that LLM inference is not a single workload — it is at least four fundamentally different memory access patterns that happen to run on the same hardware. Treating them identically is leaving performance on the table.

Prefill
phase_id: 0x01Process entire input sequence in parallel. Massive sustained write bandwidth to KV cache.
tRCD: 22 clk (nominal)
V_swing: 300 mV
DFE Tap1: 0x10
Goal: Max bandwidth, link stability
Decode
phase_id: 0x02One token per forward pass. Latency-critical random reads from KV cache. Sub-100ns target.
tRCD: 18 clk (−4, tightened)
V_swing: 280 mV
DFE Tap1: 0x14 (boosted)
Goal: Minimum latency
Agentic
phase_id: 0x03Tool calls, API integration, multi-step planning. Irregular burst-mode access pattern.
tRCD: 20 clk (−2, balanced)
V_swing: 290 mV
DFE Tap1: 0x12
Goal: Burst tolerance
Idle
phase_id: 0x04Between inference requests. CPU in C6, near-zero memory command rate.
tRCD: 24 clk (+2, relaxed)
V_swing: 240 mV
DFE Tap1: 0x08 (reduced)
Goal: Minimum power
"The system applies configurations predictively — 500 microseconds before a phase transition — so the memory is already optimized when the first new-phase request arrives."

That 500-microsecond window is critical. Without predictive pre-adjustment, a conventional system would detect the phase change only after new-phase requests start arriving, then spend 2+ milliseconds retraining its PHY before achieving optimal performance. My system eliminates that penalty almost entirely — >95% reduction — because the hardware is already reconfigured when the transition happens.

For training workloads, two additional phases extend the framework: ForwardPass (0x05) receives bandwidth-optimised signaling, and BackwardPass (0x06) reduces write recovery time (tWR) to increase gradient write throughput. A GPU-side firmware agent receives synchronised hints via shared-memory IPC, enabling cross-device optimisation during distributed training steps.

What exists — and why this is different

Five categories of prior art were analysed. None of them — individually or in combination — teach the cross-layer, predictive, software-defined architecture that this patent introduces.

Reference What it does Why it's different Verdict
US20230195873A1
Dynamic DRAM Timing Adjustment
Adapts PHY timing based on measured signal integrity telemetry and die temperature Entirely reactive — updates after degradation is detected. No software workload hints, no AI metrics, no predictive control. ✓ Distinguished
US11409612B2
Adaptive Refresh Controller
Adjusts DRAM refresh rate based on observed access patterns Scope is exclusively refresh control. Does not address tRCD/tCL/tRP, voltage swing, equalization, or cross-layer coordination. ✓ Distinguished
CXL Spec v2.0/v3.0
CXL Consortium
Memory tiering, coherency protocols, latency-based migration policies Does not disclose workload-hint-driven modification of PHY signaling parameters based on AI execution phase. Per-region PHY adaptation is not contemplated. ✓ Distinguished
AMD EXPO / Intel XMP
Memory Profiles
Static overclocking profiles selectable at boot Fixed at boot. Cannot adapt at runtime based on workload phase. No AI runtime integration. ✓ Distinguished
ISCA 2022
ML for DRAM Timing Optimization
ML inference on physical telemetry (eye closure, temperature) to predict safe margin reductions Hardware-observable metrics only. Cannot distinguish decode from prefill — both look similar at the PHY level. No software-layer interface. ✓ Distinguished
Claim 1 — Independent Claim (System)
"A system for adaptive memory signaling control... wherein the memory signaling configuration is applied predictively, before a workload phase transition is completed, and all applied configurations are validated by a hardware safety limiter operating within a hardware trust boundary separate from the memory policy engine."

40 claims across seven innovation dimensions

Safety — The Hardware Root of Trust

One of the most important design decisions was making the safety limiter genuinely immutable. If software can instruct the memory controller to change voltage and timing, it could in principle be used as a fault-injection attack vector — a software-triggered version of Rowhammer, or a thermal throttling attack against an adjacent security context.

The patent explicitly claims a hardware safety limiter implemented as immutable logic within the memory controller ASIC, anchored to the platform root of trust, that reads SPD EEPROM data at boot and clamps every proposed configuration to within JEDEC JESD79-5 compliant bounds. Software cannot override, bypass, or reprogram it at any privilege level.

CXL Fabric — Per-Region Independent Adaptation

Modern data centre memory is no longer a single flat DDR channel. CXL 2.0/3.0 enables pooled memory, memory expansion, and heterogeneous topologies where a single host may see local DDR5 DIMMs, CXL Type-2 accelerator-attached memory, and remote memory expansion nodes — all simultaneously.

The patent covers per-region independent PHY adaptation across CXL HDM regions, and adds a further embodiment: when CXL fabric link utilization exceeds 80%, the system relaxes CXL link timing to prioritise reliability, while simultaneously tightening local DDR5 margins to compensate for the lost fabric bandwidth. This congestion-responsive co-adaptation is not addressed in the CXL specification.

NUMA Multi-Socket — IPI Hint Propagation

In multi-socket servers, thread migration between NUMA domains is a common OS scheduler event. Without hint propagation, the destination socket would be left operating on stale signaling parameters for the duration of its next retraining interval. The patent claims simultaneous update of source and destination Memory Policy Engines via an inter-processor interrupt carrying the active workload hint — eliminating the NUMA-migration penalty entirely. Empirical modelling shows 12–18% cross-socket latency improvement from this alone.

Speculative Prefetch — Attention-Derived

When an LLM runtime knows which KV cache attention heads will be needed in the next decode step — which is often determinable from the current token's attention pattern — it can transmit a speculative prefetch hint identifying the anticipated DRAM row addresses. The memory controller issues speculative row activations before the actual read request arrives, eliminating the tRCD penalty (18 clocks = 5.6ns at DDR5-6400) for predictable accesses. Simulation on LLaMA-3 70B with grouped-query attention shows 8–14% additional P99 latency reduction on top of the base phase-aware system.

Claims Breakdown
System claims (apparatus) Claims 1–25, 31–40
Method claims Claims 26–30
Total independent claims 2
Drawings (figures) 5
Prior art references 5
Memory topologies covered
DDR5 / DDR6 MRDIMM (MRCD+MDB) HBM2e / HBM3 CXL 2.0 / 3.0 NVLink GPU Memory NUMA Multi-Socket

Seven sectors where this matters commercially

01
Data Centre AI Inference
Inference-as-a-service platforms where per-server latency improvements aggregate to massive fleet-scale cost savings. Every 20% decode latency reduction translates directly to throughput and SLA improvements.
02
Edge AI Appliances
Where power budgets are constrained and battery life matters. The 39% idle-power reduction is commercially critical for always-on inference at the edge.
03
AI Training Clusters
Multi-GPU distributed training where forward/backward pass phase awareness reduces job completion time and energy consumption at scale.
04
CPU / SoC Manufacturers
ARM Neoverse V-series, x86-64 server CPUs, RISC-V AI SoCs — all benefit from integrating the Memory Policy Engine. A natural firmware-layer addition to next-generation memory controllers.
05
Memory Controller IP
Rambus, Synopsys, and other PHY IP vendors whose products are direct implementation vehicles for the claimed signaling adaptation. A natural licensing target.
06
DIMM / HBM Manufacturers
MRDIMM-specific MRCD+MDB programming embodiments provide a differentiating feature for next-generation DDR5 and HBM3 product lines.
07
Cloud Hyperscalers
At fleet scale, marginal per-server improvements become enormous in aggregate. A hyperscaler running 100,000 inference nodes sees the 39% idle-power reduction as a data centre power and cooling budget line item.

The road from provisional to granted patent

26 April 2026
Provisional Specification Filed
Application 202641053160 filed at the Indian Patent Office, Chennai. Fee of ₹1,600 paid. Receipt CBR No. 35104. Ref: TEMP/E1/58075/2026-CHE. 40 claims, 5 figures, 5 prior art references.
By 26 April 2027
Complete Specification Due
Under Section 9(1) of the Patents Act 1970, the complete specification must be filed within 12 months of the provisional filing date. This is a hard deadline — missing it results in abandonment of the application.
3
Recommended: Q3 2026
PCT International Application (Optional)
Filing a PCT application within 12 months of the provisional establishes an international filing date across 150+ countries. This preserves the option to prosecute in the US, EU, China, Japan, and Korea — where the primary commercial markets for memory controller IP and AI infrastructure exist.
4
~18–24 months post-complete
Examination & Grant
The Indian Patent Office typically completes examination within 2–3 years. A request for expedited examination (Form 18A) is available for applications meeting certain criteria and can accelerate this timeline.

Filing Details

Indian Patent Office · Chennai Jurisdiction

Application Number 202641053160
Reference Number TEMP/E1/58075/2026-CHE
Filing Date 26 April 2026
Docket / CBR No. 61982 / 35104
Transaction ID N-0001936456
Fee Paid ₹ 1,600 · Online Bank Transfer
Type Provisional Specification · Form 1
Applicant Manish Keshav Lachwani · Natural Person · Indian
Complete Spec Deadline 26 April 2027

The memory wall is real. The moat is real.

The AI industry is in an arms race over compute. Chips, interconnects, cooling. But I believe the next phase of the race — the efficiency phase — will be won by the teams that close the gap between what the software knows and what the hardware does.

Memory is the bridge between compute and data. Right now that bridge is dumb: it doesn't know what's crossing it or why. This patent is about making it smart — giving it the information it needs to do its job 20% faster, 39% more efficiently, and without the performance cliffs that happen every time an AI workload changes gear.

The technical novelty is real. The prior art is clearly distinguished. The commercial applicability spans seven industry segments. And it's filed — patent pending in India as of today, with the option to go global within the year.

If you're working in memory controller IP, AI infrastructure, or LLM serving systems and want to talk about this, reach out.

Patent Pending Notice

This technology is the subject of Indian Patent Application No. 202641053160 filed as a provisional specification under the Patents Act, 1970 (39 of 1970) at the Indian Patent Office, Chennai, on 26 April 2026. All rights reserved. Patent Pending.

Quick Reference Summary
Title: System and Method for Software-Defined, Workload-Aware Adaptive Memory Signaling and Timing Control in AI Computing Systems
Core Innovation: Cross-layer workload hint interface enabling AI runtimes to predictively configure memory PHY signaling parameters
Key Result: 20% decode latency reduction, 39% idle power reduction, >95% transition penalty elimination
Memory Coverage: DDR5, MRDIMM, HBM2e/HBM3, CXL 2.0/3.0, NUMA multi-socket, multi-GPU
Claims: 40 total (2 independent, 38 dependent) across system and method
Safety: Immutable hardware root-of-trust safety limiter — JEDEC-compliant at all times
Filing: Indian Patent Office · Provisional · Application 202641053160
Inventor: Manish Keshav Lachwani, Bengaluru, India