Original Research April 26, 2026

Teaching Computers
to Remember
Smarter

How I filed a patent on a system that lets AI software directly tell memory hardware what it needs — before it needs it — achieving 20% lower latency, 39% less idle power, and eliminating 95% of transition overhead.

Application No. 202641053160

Filed 26 April 2026

Office Indian Patent Office, Chennai

Status ● Provisional Filed

Claims 40 | 5 Figures

Read the full story

01 The Problem

AI hit a wall. Not a compute wall — a memory wall.

Everyone talks about GPUs. Billions of dollars poured into compute. But here is the quiet truth that anyone who has run LLM inference at scale already knows: the bottleneck is memory. Not how fast you can multiply matrices — how fast you can feed those matrices data.

When a large language model like LLaMA-3 70B is generating text, it needs to read its entire KV cache — potentially hundreds of gigabytes — on every single token step. That is a random-access memory problem, not a compute problem. And the memory system has no idea this is happening.

"The memory controller is a black box. It responds to requests. It has no idea if those requests are latency-critical or bandwidth-hungry."

Today's DRAM systems operate on static timing margins set at boot from an SPD EEPROM chip. Those margins are conservative — designed for worst-case conditions across every workload that might ever run. A prefill sweep that needs maximum sustained bandwidth gets the same timing as a decode step that needs minimum latency. An idle system between inference requests keeps all its power circuits fully energized, burning watts for nothing.

I spent time deeply studying the Rambus and ARM memory architecture ecosystems, and I kept coming back to the same question: why doesn't the software tell the hardware what it's about to do?

Every LLM runtime knows exactly what phase it's in. It knows when prefill ends and decode begins. It knows when the agent is in a planning loop. Why isn't that information flowing down to the memory controller?

That gap is the invention.

Decode Latency

20%

reduction in P99 random-read latency for LLaMA-3 70B decode phase

Idle Power

39%

lower memory subsystem power during idle phases (2-socket, 16-DIMM system)

Transition Overhead

>95%

elimination of phase-transition retraining penalty (2.1ms → <0.1ms)

Prefill Bandwidth

6.5%

increase in sustained memory bandwidth during prefill across 8 DDR5-6400 channels

Memory system parameters adapted

tRCD tCL tRP tRAS V_swing DFE Tap 1–4 CTLE gain Per-lane skew Refresh rate PHY PLL freq

02 The Invention

A cross-layer bridge between AI software and memory physics

The patent — formally titled "System and Method for Software-Defined, Workload-Aware Adaptive Memory Signaling and Timing Control in Artificial Intelligence Computing Systems" — describes a four-component architecture that lets AI runtimes communicate workload phase information directly to memory hardware, which then pre-adjusts its signaling parameters before the next phase begins.

Fig. 1 — System Architecture

Component 1: Runtime Workload Classifier

Running inside the AI software stack — or as an OS kernel module — this component continuously samples performance counters at 100-microsecond intervals. It watches token emission rate, KV cache allocation bandwidth, system call patterns, and CPU C-state transitions. From these signals, it classifies the current execution into one of six phases.

Component 2: Workload Hint Interface

A 64-bit structured register write — transmitted via CPU Model-Specific Register, MMIO, or CXL DVSEC sideband — carries a compact hint from the software classifier to the firmware layer. The hint contains a phase identifier, latency target, bandwidth target, security level, and priority. This interface is the key novelty: it is the first time an AI runtime can express its memory semantics in hardware-readable form.

Component 3: Memory Policy Engine

A firmware component (in the CPU or memory controller ASIC) that holds a policy lookup table mapping phase identifiers to complete signaling configurations. It predicts upcoming transitions and schedules pre-adjustments 500 microseconds before a phase change — proactively, not reactively. It also runs a closed-loop feedback controller using ECC error rates and measured latency to continuously refine margins.

Component 4: Memory Interface Subsystem

The hardware layer — DDR5 controller, PHY circuitry, DRAM devices — that receives and applies the configurations. Every configuration is validated by an immutable hardware safety limiter anchored in the platform root of trust, ensuring no software can push the memory outside JEDEC-compliant bounds regardless of what any hint says.

          /* Workload Hint Interface
   64-bit packed structure */
struct mem_workload_hint {
  uint8_t  phase_id;
  /* 0x01 = Prefill
     0x02 = Decode
     0x03 = Agentic
     0x04 = Idle
     0x05 = ForwardPass
     0x06 = BackwardPass */

  uint16_t latency_target_ns;
  uint16_t bw_target_gbps;
  uint8_t  security_level;
  uint8_t  priority; /* 0–7 */
  uint8_t  reserved[2];
} __attribute__((packed));
/* Total: 8 bytes = 1 register write */
        

Transmission channels

MSR CPU Model-Specific Register 0xC001_XXXX

MMIO Memory controller config space

CXL DVSEC CXL device sideband register

03 Phase-Aware Adaptation

Every phase of AI execution has a different memory personality

The core insight is that LLM inference is not a single workload — it is at least four fundamentally different memory access patterns that happen to run on the same hardware. Treating them identically is leaving performance on the table.

Prefill

phase_id: 0x01Process entire input sequence in parallel. Massive sustained write bandwidth to KV cache.

tRCD: 22 clk (nominal)

V_swing: 300 mV

DFE Tap1: 0x10

Goal: Max bandwidth, link stability

Decode

phase_id: 0x02One token per forward pass. Latency-critical random reads from KV cache. Sub-100ns target.

tRCD: 18 clk (−4, tightened)

V_swing: 280 mV

DFE Tap1: 0x14 (boosted)

Goal: Minimum latency

Agentic

phase_id: 0x03Tool calls, API integration, multi-step planning. Irregular burst-mode access pattern.

tRCD: 20 clk (−2, balanced)

V_swing: 290 mV

DFE Tap1: 0x12

Goal: Burst tolerance

Idle

phase_id: 0x04Between inference requests. CPU in C6, near-zero memory command rate.

tRCD: 24 clk (+2, relaxed)

V_swing: 240 mV

DFE Tap1: 0x08 (reduced)

Goal: Minimum power

"The system applies configurations predictively — 500 microseconds before a phase transition — so the memory is already optimized when the first new-phase request arrives."

That 500-microsecond window is critical. Without predictive pre-adjustment, a conventional system would detect the phase change only after new-phase requests start arriving, then spend 2+ milliseconds retraining its PHY before achieving optimal performance. My system eliminates that penalty almost entirely — >95% reduction — because the hardware is already reconfigured when the transition happens.

For training workloads, two additional phases extend the framework: ForwardPass (0x05) receives bandwidth-optimised signaling, and BackwardPass (0x06) reduces write recovery time (tWR) to increase gradient write throughput. A GPU-side firmware agent receives synchronised hints via shared-memory IPC, enabling cross-device optimisation during distributed training steps.

04 Prior Art & Novelty

What exists — and why this is different

Five categories of prior art were analysed. None of them — individually or in combination — teach the cross-layer, predictive, software-defined architecture that this patent introduces.

Reference	What it does	Why it's different	Verdict
US20230195873A1 Dynamic DRAM Timing Adjustment	Adapts PHY timing based on measured signal integrity telemetry and die temperature	Entirely reactive — updates after degradation is detected. No software workload hints, no AI metrics, no predictive control.	✓ Distinguished
US11409612B2 Adaptive Refresh Controller	Adjusts DRAM refresh rate based on observed access patterns	Scope is exclusively refresh control. Does not address tRCD/tCL/tRP, voltage swing, equalization, or cross-layer coordination.	✓ Distinguished
CXL Spec v2.0/v3.0 CXL Consortium	Memory tiering, coherency protocols, latency-based migration policies	Does not disclose workload-hint-driven modification of PHY signaling parameters based on AI execution phase. Per-region PHY adaptation is not contemplated.	✓ Distinguished
AMD EXPO / Intel XMP Memory Profiles	Static overclocking profiles selectable at boot	Fixed at boot. Cannot adapt at runtime based on workload phase. No AI runtime integration.	✓ Distinguished
ISCA 2022 ML for DRAM Timing Optimization	ML inference on physical telemetry (eye closure, temperature) to predict safe margin reductions	Hardware-observable metrics only. Cannot distinguish decode from prefill — both look similar at the PHY level. No software-layer interface.	✓ Distinguished

Claim 1 — Independent Claim (System)

"A system for adaptive memory signaling control... wherein the memory signaling configuration is applied predictively, before a workload phase transition is completed, and all applied configurations are validated by a hardware safety limiter operating within a hardware trust boundary separate from the memory policy engine."

05 Technical Depth

40 claims across seven innovation dimensions

Safety — The Hardware Root of Trust

One of the most important design decisions was making the safety limiter genuinely immutable. If software can instruct the memory controller to change voltage and timing, it could in principle be used as a fault-injection attack vector — a software-triggered version of Rowhammer, or a thermal throttling attack against an adjacent security context.

The patent explicitly claims a hardware safety limiter implemented as immutable logic within the memory controller ASIC, anchored to the platform root of trust, that reads SPD EEPROM data at boot and clamps every proposed configuration to within JEDEC JESD79-5 compliant bounds. Software cannot override, bypass, or reprogram it at any privilege level.

CXL Fabric — Per-Region Independent Adaptation

Modern data centre memory is no longer a single flat DDR channel. CXL 2.0/3.0 enables pooled memory, memory expansion, and heterogeneous topologies where a single host may see local DDR5 DIMMs, CXL Type-2 accelerator-attached memory, and remote memory expansion nodes — all simultaneously.

The patent covers per-region independent PHY adaptation across CXL HDM regions, and adds a further embodiment: when CXL fabric link utilization exceeds 80%, the system relaxes CXL link timing to prioritise reliability, while simultaneously tightening local DDR5 margins to compensate for the lost fabric bandwidth. This congestion-responsive co-adaptation is not addressed in the CXL specification.

NUMA Multi-Socket — IPI Hint Propagation

In multi-socket servers, thread migration between NUMA domains is a common OS scheduler event. Without hint propagation, the destination socket would be left operating on stale signaling parameters for the duration of its next retraining interval. The patent claims simultaneous update of source and destination Memory Policy Engines via an inter-processor interrupt carrying the active workload hint — eliminating the NUMA-migration penalty entirely. Empirical modelling shows 12–18% cross-socket latency improvement from this alone.

Speculative Prefetch — Attention-Derived

When an LLM runtime knows which KV cache attention heads will be needed in the next decode step — which is often determinable from the current token's attention pattern — it can transmit a speculative prefetch hint identifying the anticipated DRAM row addresses. The memory controller issues speculative row activations before the actual read request arrives, eliminating the tRCD penalty (18 clocks = 5.6ns at DDR5-6400) for predictable accesses. Simulation on LLaMA-3 70B with grouped-query attention shows 8–14% additional P99 latency reduction on top of the base phase-aware system.

Claims Breakdown

System claims (apparatus) Claims 1–25, 31–40

Method claims Claims 26–30

Total independent claims 2

Drawings (figures) 5

Prior art references 5

Memory topologies covered

DDR5 / DDR6 MRDIMM (MRCD+MDB) HBM2e / HBM3 CXL 2.0 / 3.0 NVLink GPU Memory NUMA Multi-Socket

06 Industrial Applicability

Seven sectors where this matters commercially

01

Data Centre AI Inference

Inference-as-a-service platforms where per-server latency improvements aggregate to massive fleet-scale cost savings. Every 20% decode latency reduction translates directly to throughput and SLA improvements.

02

Edge AI Appliances

Where power budgets are constrained and battery life matters. The 39% idle-power reduction is commercially critical for always-on inference at the edge.

03

AI Training Clusters

Multi-GPU distributed training where forward/backward pass phase awareness reduces job completion time and energy consumption at scale.

04

CPU / SoC Manufacturers

ARM Neoverse V-series, x86-64 server CPUs, RISC-V AI SoCs — all benefit from integrating the Memory Policy Engine. A natural firmware-layer addition to next-generation memory controllers.

05

Memory Controller IP

Rambus, Synopsys, and other PHY IP vendors whose products are direct implementation vehicles for the claimed signaling adaptation. A natural licensing target.

06

DIMM / HBM Manufacturers

MRDIMM-specific MRCD+MDB programming embodiments provide a differentiating feature for next-generation DDR5 and HBM3 product lines.

07

Cloud Hyperscalers

At fleet scale, marginal per-server improvements become enormous in aggregate. A hyperscaler running 100,000 inference nodes sees the 39% idle-power reduction as a data centre power and cooling budget line item.

07 What Comes Next

The road from provisional to granted patent

✓

26 April 2026

Provisional Specification Filed

Application 202641053160 filed at the Indian Patent Office, Chennai. Fee of ₹1,600 paid. Receipt CBR No. 35104. Ref: TEMP/E1/58075/2026-CHE. 40 claims, 5 figures, 5 prior art references.

→

By 26 April 2027

Complete Specification Due

Under Section 9(1) of the Patents Act 1970, the complete specification must be filed within 12 months of the provisional filing date. This is a hard deadline — missing it results in abandonment of the application.

Recommended: Q3 2026

PCT International Application (Optional)

Filing a PCT application within 12 months of the provisional establishes an international filing date across 150+ countries. This preserves the option to prosecute in the US, EU, China, Japan, and Korea — where the primary commercial markets for memory controller IP and AI infrastructure exist.

~18–24 months post-complete

Examination & Grant

The Indian Patent Office typically completes examination within 2–3 years. A request for expedited examination (Form 18A) is available for applications meeting certain criteria and can accelerate this timeline.

Filing Details

Indian Patent Office · Chennai Jurisdiction

Application Number 202641053160

Reference Number TEMP/E1/58075/2026-CHE

Filing Date 26 April 2026

Docket / CBR No. 61982 / 35104

Transaction ID N-0001936456

Fee Paid ₹ 1,600 · Online Bank Transfer

Type Provisional Specification · Form 1

Applicant Manish Keshav Lachwani · Natural Person · Indian

Complete Spec Deadline 26 April 2027

08 Closing Thoughts

The memory wall is real. The moat is real.

The AI industry is in an arms race over compute. Chips, interconnects, cooling. But I believe the next phase of the race — the efficiency phase — will be won by the teams that close the gap between what the software knows and what the hardware does.

Memory is the bridge between compute and data. Right now that bridge is dumb: it doesn't know what's crossing it or why. This patent is about making it smart — giving it the information it needs to do its job 20% faster, 39% more efficiently, and without the performance cliffs that happen every time an AI workload changes gear.

The technical novelty is real. The prior art is clearly distinguished. The commercial applicability spans seven industry segments. And it's filed — patent pending in India as of today, with the option to go global within the year.

If you're working in memory controller IP, AI infrastructure, or LLM serving systems and want to talk about this, reach out.

Patent Pending Notice

This technology is the subject of Indian Patent Application No. 202641053160 filed as a provisional specification under the Patents Act, 1970 (39 of 1970) at the Indian Patent Office, Chennai, on 26 April 2026. All rights reserved. Patent Pending.

Quick Reference Summary

Title: System and Method for Software-Defined, Workload-Aware Adaptive Memory Signaling and Timing Control in AI Computing Systems

Core Innovation: Cross-layer workload hint interface enabling AI runtimes to predictively configure memory PHY signaling parameters

Key Result: 20% decode latency reduction, 39% idle power reduction, >95% transition penalty elimination

Memory Coverage: DDR5, MRDIMM, HBM2e/HBM3, CXL 2.0/3.0, NUMA multi-socket, multi-GPU

Claims: 40 total (2 independent, 38 dependent) across system and method

Safety: Immutable hardware root-of-trust safety limiter — JEDEC-compliant at all times

Filing: Indian Patent Office · Provisional · Application 202641053160

Inventor: Manish Keshav Lachwani, Bengaluru, India