Use the application-number links above to jump directly to the corresponding summary and abstract below.
2026410624512026-05-17GhostKV / Query-Time KV Elimination
GHOSTKV: A System and Method for Query-Time Bounded Elimination of Reconstructable Key-Value Witnesses in Transformer Attention Mechanisms
Summary
This filing introduces GhostKV, a reconstruction-first transformer-memory architecture in which cold KV-cache entries are converted into compact witness records and are eliminated or resurrected at query time before full memory movement occurs. Instead of compressing and loading every historical KV entry, the system uses conservative upper bounds to decide whether a token can be safely ignored for the current decode step, turning bandwidth preservation itself into the core systems primitive.
Abstract
Cold KV-cache entries are transformed into compact ghost records comprising an attention sketch, a semantic anchor identifier, and a residual uncertainty fingerprint. For each decode step, a runtime computes a conservative upper bound on the possible attention contribution of each ghost record relative to the current query vector. Ghost tokens whose bound falls below an elimination threshold are skipped entirely, avoiding memory transfer, decompression, and exact attention computation for those entries; ghost tokens whose bound survives are resurrected through one or more reconstruction paths before exact attention is computed over the surviving candidate set. The invention thereby provides query-adaptive, bounded elimination of reconstructable key-value witnesses before memory movement occurs, distinguishing itself from KV quantization, eviction-based sparse KV methods, and approximate nearest-neighbor retrieval approaches.
2026410623022026-05-16Attention-Sink KV Placement
Methods and Systems for Attention-Sink-Aware SRAM Placement of Key-Value State in Transformer Inference
Summary
This filing introduces a transformer-aware memory-placement mechanism that identifies tokens with persistent attention-sink behavior and selectively promotes their key-value state into SRAM or another fast memory tier. Instead of treating all KV-cache entries as equally important, the system exploits the fact that a small subset of tokens attracts disproportionate attention during decode and therefore deserves privileged low-latency placement.
Abstract
During prefill, the inference system monitors attention scores across heads and layers to derive sink metrics for tokens that repeatedly attract attention mass, such as beginning-of-sequence anchors, system-prompt tokens, separators, retrieval anchors, or other structurally important positions. A placement controller maintains token-to-tier mapping metadata and causes KV state for identified sink tokens to be mirrored, copied, pinned, or promoted from a larger-capacity tier such as HBM or DRAM into a lower-latency, higher-bandwidth tier such as SRAM. During autoregressive decode, a dedicated bypass path services sink-token KV reads from the fast tier while non-sink-token state continues to be served from the bulk memory tier, reducing repeated long-latency reads, lowering bandwidth pressure, and improving time-per-token latency, power efficiency, and serving throughput. Embodiments cover threshold-based post-prefill placement, decode-time exponential-moving-average sink detection, hardware-controller implementations with tag arrays and bypass multiplexers, head-wise and layer-wise selective promotion, speculative decoding, and multi-tenant shared-prefix reuse.
2026410594122026-05-10Inference Weight Delivery
Weight-Aware Sequencer for Inference-Time Model-Weight Delivery with Compiler-Generated Hot-Tile Semantics, Pressure-Adaptive Fanout, and Weight/Key-Value Memory Separation
Summary
This filing argues that model weights in modern neural-network inference should be treated as a distinct systems object rather than as generic memory traffic. The invention introduces a hardware weight-aware sequencer that interprets compiler-generated weight-tile semantics, fetches and stages model-weight tiles into local SRAM without CPU-directed per-transfer commands in the hot path, and adapts replication behavior under runtime pressure while remaining operationally separate from the key-value cache path.
Abstract
A compiler or planning engine partitions model weights into tiles and generates a weight map containing semantic metadata including hot-tile class, predicted reuse count, default fanout, runtime override policy, and key-value pressure sensitivity. A dedicated hardware sequencer comprising a weight-map register file, tile-prefetch scheduler, replication or multicast controller, and staging-buffer reservation manager reads the weight map and runtime pressure signals to autonomously fetch, stage, and selectively replicate model-weight tiles into local SRAM staging buffers for inference execution. Replication fanout can be reduced, deferred, or otherwise modified in response to staging-buffer pressure or key-value-path congestion, while a weight-designated memory path remains distinct from the key-value-cache memory path. In some embodiments, a persistent read-mostly weight tier such as MRAM enables instant-on inference by keeping model weights resident across power cycles.
2026410592762026-05-09SmartNIC / Agentic AI Hardware
Hardware SmartNIC Architecture for Agentic Artificial Intelligence Workloads with Intent-Parsing, Policy-Bounded Autonomous Dataplane Engines, and Memory-Orchestration Hardware
Summary
This filing introduces an AgentNIC SmartNIC architecture that moves agent-aware infrastructure control into dedicated hardware. Instead of leaving agent intent parsing, queue steering, retry containment, memory movement, and compliance logging to host software, the device processes agent-level metadata in silicon and autonomously performs bounded dataplane actions under hardware policy control.
Abstract
A physical SmartNIC device implements an agent-intent parser, intent descriptor tables, a hardware policy enforcement engine, an agent-aware queue scheduler, a bounded autonomous dataplane engine, a memory-orchestration DMA subsystem, retry-amplification suppression logic, and an audit-chain logging block. The hardware classifies operations using fields such as agent identity, trust class, inference-session state, retry lineage, latency budget, workflow state, memory-transfer intent, and audit requirement, then selectively permits queue assignment, transport selection, memory movement, backoff, quarantine, or escalation while generating verifiable audit records for autonomous actions. Filing details recorded in the e-filing receipt include provisional Form 1 submission, receipt timestamp 2026-05-09 20:11:41, and filing reference TEMP/E-1/64615/2026-CHE.
2026410531602026-04-26Adaptive Memory Signaling
System and Method for Software-Defined, Workload-Aware Adaptive Memory Signaling and Timing Control in Artificial Intelligence Computing Systems
Summary
This filing introduces a cross-layer control path that lets AI runtimes communicate workload phase semantics directly to the memory subsystem so signaling, timing, equalization, and power settings can be predictively adapted for prefill, decode, agentic loop, and idle behavior instead of remaining locked to conservative boot-time margins.
Abstract
A Runtime Workload Classifier observes application-level metrics such as token emission rate, KV-cache growth, kernel scheduling patterns, and CPU power state, then emits structured workload hints through a privileged interface to a Memory Policy Engine. The policy engine maps those hints to low-level interface controls including DRAM timing parameters such as tRCD, tCL, tRP, and tRAS, differential voltage swing, DFE tap coefficients, and CTLE settings across DDR5, MRDIMM, HBM, and CXL-attached memory. The adaptation is predictive rather than reactive, with closed-loop feedback from ECC events, read-retry counters, and measured latency, while an immutable hardware safety limiter keeps all adjustments inside JEDEC- and SPD-defined bounds.
2026410563092026-05-04KV-CPU / KV Cache Architecture
KV-Cache Companion Processing Unit (KV-CPU): A Closed-Loop AI-Native Memory Compute Architecture Providing Near-Memory Attention Score Reduction, Decode-Step-Aware Hardware Eviction, Request-Isolated KV Block Orchestration, and Hardware-Enforced Kernel Memory Tier Integration for Transformer Inference Acceleration
Summary
This filing introduces a dedicated KV-Cache Companion Processing Unit, or KV-CPU, that occupies an intermediate memory tier for large-language-model inference and closes the loop between near-memory attention-score reduction, decode-step-aware hardware eviction and prefetch, request-isolated KV block tracking, and kernel-mediated memory-tier control. Instead of treating KV-cache management as a software-only residency problem, the architecture turns decode-step feedback into a hardware control cycle that continuously decides what stays local, what moves, and how shared prefix state is reused across concurrent requests.
Abstract
The invention discloses a KV-CPU attached over CXL 3.0 or PCIe 5.0 that combines a Near-Memory Compute Engine (NMCE) for attention-score reduction on LPDDR5X-resident key vectors, a Hardware Eviction and Prefetch Controller (HEPC) that updates block priority and transfer scheduling at decode-step timescales, and a Request-Tagged Block Directory (RTBD) that tracks physical placement, sharing state, and request isolation for KV blocks across tiers. A Linux kernel driver exposes MADV_KV_HOT, MADV_KV_EVICT, MADV_KV_PREFETCH, and io_uring-based migration controls that terminate in direct hardware state updates, allowing user-space runtimes to pin, evict, prefetch, and migrate KV blocks without software-managed critical-path orchestration. The core invention is the closed-loop coordination among compute output, residency metadata, and hardware movement policy for long-context transformer inference acceleration.
2026410575172026-05-06MoE Neural Processors
System and Method for Predictive Expert Weight Staging, Sparse-Dispatch Memory Orchestration, and Memory-Bounded Execution in Mixture-of-Experts Neural Processors
Summary
This filing moves mixture-of-experts execution control into hardware by predicting expert demand from routing semantics, staging expert weights across SRAM, HBM, and lower tiers, and admitting sparse expert execution only when the required memory state and bandwidth budgets are ready.
Abstract
A hardware control subsystem observes router-derived expert-selection metadata before dispatch materialization is complete, generates expert-demand state, and uses that state to promote, retain, replicate, demote, or prefetch expert weights across multiple memory tiers. A sparse-dispatch regrouping path forms expert-wise dispatch units and gates descriptor emission on execution-valid expert windows, while admission logic enforces memory-bounded execution by requiring corresponding expert-state validity and reserved transfer or execution budgets. Training embodiments preserve forward-pass expert-demand information to guide backward-pass, gradient, optimizer, rematerialization, or replay-related staging, and deterministic scheduling embodiments allocate bounded staging and dispatch slots for expert groups within a hardware-enforced scheduling horizon.
2026410496252026-04-18Agent Memory Controller
Agent Memory Controller: A Hardware-Accelerated Data Processing Unit Architecture for Orchestration Offload, Structured-Response Validation, and State-Aware Key-Value Cache Management in Multi-Turn Large Language Model Inference Systems
Summary
This filing repositions the DPU as an Agent Memory Controller for multi-turn LLM systems, offloading structured-response validation, agent session orchestration, predictive KV-cache movement, and peer-to-peer transfer into a dedicated hardware control plane instead of the host CPU.
Abstract
The invention describes a DPU-based architecture that maintains per-session finite-state machines, performs hardware-accelerated validation of structured tool responses, manages KV-cache placement across GPU HBM, DPU-local memory, and CXL-attached tiers, and issues direct peer-to-peer DMA transfers to GPU memory without host CPU involvement. By combining validation engines, session-state control, KV prefetch logic, and transfer orchestration on the DPU, the system reduces PCIe crossings, removes CPU touchpoints from the critical path, and improves throughput for agentic, tool-using, multi-turn inference workloads.
2026410433592026-04-04Inference Accelerator Memory
Summary
This filing introduces an architecture-agnostic hardware primitive that allows a privileged runtime to bind a bounded region of on-chip volatile memory to an off-chip backing address for a declared lifetime, with replacement logic structurally prevented from evicting the bound data during that interval.
Abstract
The invention extends the on-chip memory hierarchy with wired residency metadata, including a wired bit, tenant identifier, and generation counter per entry; replacement-controller logic that unconditionally excludes wired lines from victim selection; privileged BIND and RELEASE semantics; quota enforcement; immutable binding mode for read-only data; a soft-pin warning mechanism with configurable grace period; and a software residency arbiter that computes binding schedules from reuse distance and layer criticality across GPU-class accelerators, NPUs, TPUs, FPGAs, ASICs, and future inference architectures.
2026410438582026 filingAI Cluster Reliability
System and Method for Cross-Layer Gray Failure Detection, Propagation-Risk Estimation, and Throughput-Preserving Orchestration in AI Compute Clusters
Summary
A seam-aware reliability orchestrator detects and contains gray failures in AI compute clusters by combining cross-layer telemetry with workload-aware control actions that preserve useful throughput before a hard outage occurs.
Abstract
The controller normalizes facilities, compute, fabric, storage, and runtime telemetry into a common time-series base, computes a Gray Failure Score (GFS), a Propagation Risk Score (PRS), and a Failure Amplification Estimate (FAE), and then chooses containment actions based on job criticality and useful-throughput impact. It can reclassify nodes, racks, paths, or storage backends into restricted admissibility states, switch checkpoint mode from full to incremental or in-memory, isolate degraded domains, reroute collective or inference traffic, and prioritize critical workloads so local partial degradation is converted into bounded local slowdown instead of cluster-wide interruption.
2026410453472026 filingMemory-Centric AI Fabric
System and Method for Hardware-Resident Memory-Centric Orchestration of Multi-Tier Data Movement for Artificial Intelligence Compute Fabrics
Summary
A hardware-resident memory-centric fabric controller orchestrates AI data movement across SRAM, HBM, host DRAM, NVMe, and fabric-attached tiers using explicit software-defined memory intent instead of implicit cache inference.
Abstract
The invention maintains a Multi-Tier Residency Map in dedicated on-controller SRAM, uses saturating hardware regret counters for line-rate admission and eviction decisions, evaluates transfer-path cost versus arithmetic recomputation cost, and issues hardware transfer descriptors to distributed execution agents positioned near fabric and accelerator boundaries. In some embodiments the commands are cryptographically signed, and transfer completion is signaled through atomic PCIe or CXL doorbells so compute dispatch can resume entirely outside the host CPU operating-system hot path.
2026410459982026 filingCompiler Memory Intent IR
System and Method for Compiler-Emitted Memory Intent Intermediate Representation for Multi-Tier Artificial Intelligence Memory Orchestration
Summary
This filing introduces a first-class Memory Intent IR emitted by AI compiler systems so runtimes and hardware controllers can reason about object value, lifetime, phase, and movement policy using structured semantics rather than reactive generic caching heuristics.
Abstract
The invention relates to generating, encoding, transmitting, and consuming a compiler-emitted memory-behavior artifact for AI objects spanning on-chip SRAM, HBM, host DRAM, CXL-attached memory, pooled memory, peer-device memory, NVMe storage, and fabric-attached memory. It addresses deficiencies of conventional approaches including absence of a transferable memory-semantics artifact, reactive rather than anticipatory orchestration, object-class blindness, phase blindness, lack of coalition awareness, and excessive data movement and spill churn. The Memory Intent IR exposes object semantics such as sequence affinity, prefix-sharing value, stable reuse windows, recompute-cheap activations, routing-probability-driven expert hotness, and phase-specific lifetimes so downstream systems can orchestrate placement, migration, retention, replication, prefetch, spill, recomputation, and eviction with workload-aware precision.
2026410423372026-04-02Enterprise Multi-Agent AI
Systems and Methods for Semantic Deduplication and Shared Execution of Agent-Generated Enterprise Tasks
Summary
An enterprise AI middleware layer collapses semantically equivalent agent tasks into one shared execution so organizations can cut duplicate compute, API calls, and backend load without modifying individual agent code.
Abstract
Canonical task fingerprints are produced through SQL normalization and transformer-based semantic encoding. A non-resetting admission window gathers clustered sister tasks before execution, while exact hash matching and approximate vector search identify duplicates and near-duplicates. Matching requests are subscribed to a single Shared Execution Unit, and the final result is fanned out to all originating agents through registered callbacks, with distributed locking and deduplication metrics supporting multi-node deployments.
2026410410112026-03-31Multimodal Context Orchestration
Systems and Methods for Deterministic Staged Context Orchestration for Large Scale Multimodal AI Reasoning Systems
Summary
A deterministic context control plane manages active evidence in large reasoning systems through typed packets, named budget regions, semantic drift detection, and machine-readable audit records.
Abstract
The system combines Structured Evidence Packets, a Budget Allocation Controller with hard token floors, a Dynamic Re-Staging Engine that compares recent output embeddings with the active evidence set, and an Audit Ledger that records every admission and eviction operation. Together these mechanisms reduce repeated context reconstruction and context thrashing across successive reasoning steps.
2026410390652026-03-29Long-Context LLM Inference
Predictive Context Region Residency and Attention Orchestration
Summary
This invention makes long-context inference adaptive by assigning both attention-computation state and memory-residency state per context region instead of treating all tokens as equally important.
Abstract
A runtime policy engine uses cross-attention density, semantic relevance, recency, positional criticality, structural landmarks, and retrieval likelihood to promote, demote, compress, summarize, or prefetch context regions. A coherence-veto guardrail prevents eviction of regions tied to recent outputs, allowing retrieval-augmented and code-repository inference systems to reduce attended context while preserving quality targets.
2026410388572026-03-28Tiered Weight Memory
System and Method for Predictive Multi-Tier Weight Residency and Precision Orchestration for Neural-Network Inference
Summary
Neural-network weights are treated as live runtime state whose placement and precision can change across HBM, lower volatile tiers, and storage-backed tiers according to workload behavior.
Abstract
A policy engine evaluates reuse, routing likelihood, layer criticality, transfer cost, decompression cost, bandwidth pressure, and quality sensitivity to decide how each weight shard or expert block should be stored and staged. The controller schedules promotions, demotions, decompression, and predictive prefetch while enforcing precision floors for quality-sensitive blocks.
2026410375092026-03-27KV Cache Systems
Systems and Methods for Deterministic Gather of Hierarchically Managed Key-Value State for Neural Network Inference
Summary
The core invention is deterministic gather of logically managed key-value state into execution-ready artifacts, supported by a residency-first hierarchical memory substrate.
Abstract
Key-value objects are given logical identities independent of physical placement and tracked across SRAM, HBM, host memory, remote memory, and storage-backed tiers. A deterministic gather engine resolves shared lineage, compiles a gather plan, overlaps retrieval across tiers, and emits execution-ready tiles or descriptors without exposing compute kernels to metadata complexity.
2026410346202026-03-23Computational Structural Biology
Method and System for Class-I Peptide-MHC Structural Normalization and Wild-Type-Relative Mutation Tracking
Summary
A deterministic structural normalization pipeline converts peptide-MHC class I structure files into stable wild-type-relative mutation fingerprints for downstream computational analysis.
Abstract
Wild-type and mutant structures are parsed into residue-level chains, assigned fixed semantic roles, and transformed into peptide-to-heavy-chain contact records under explicit distance criteria. The system computes a normalized delta relative to wild type and serializes a machine-readable fingerprint object containing mutation identity, chain mapping status, and contact-derived fields.
2026410319262026-03-17Accelerator Architecture
SRMIC-X1: SRAM-Residency Memory-Centric Inference Chip - A Residency-First LLM Decode Accelerator Architecture
Summary
SRMIC-X1 is a residency-first inference accelerator that treats the active per-token working set as a hardware primitive and prioritizes hot SRAM residency over HBM-bound decode behavior.
Abstract
The architecture combines a distributed Hot Residency Memory SRAM tier, a high-bandwidth SRMESH-X interconnect, an HBM cold tier, and an optional CXL warm tier. Hardware residency controllers execute promote, demote, pin, spill, and multicast operations using per-page metadata, while bounded per-region working-set rules keep decode-critical service time stable.
2026410297642026-03-12Kernel Autotuning
Systems and Methods for Noise-Aware Benchmarking and Optimization of AI-Generated Compute Kernels
Summary
This filing defines a reliability-gated benchmarking loop for generated accelerator kernels so noisy measurements do not drive bad kernel-selection decisions.
Abstract
Benchmark trials are classified by telemetry-derived contamination state, unresolved finalists receive additional benchmark budget, and a kernel is promoted only if it remains superior under acceptable measurement conditions. The inventive chain is contamination classification, targeted reruns, and gated promotion.
2026410263562026-03-06AI Audit and Enforcement
Systems and Methods for Enforced Immutable Reasoning Event Logging Using Isolated Memory Tiers
Summary
This invention makes reasoning-event logging mandatory for AI agent actions by tying tool execution to an immutable, hardware-isolated audit pathway.
Abstract
An Action Intent Capsule must be written to a hardware-isolated append-only tier and validated through a Log Commit Token before any tool gateway can authorize execution. A strict propose, log-commit, execute, and finalize pipeline, combined with tamper-evident data structures, creates an intent-to-effect audit trail.
2026410260342026-03-05Compiler and I/O Scheduling
Unified Storage-Aware Accelerator Compilation with Deterministic I/O Windows and Bounded Stall Guarantees
Summary
Storage and network movement are elevated into accelerator compilation so compute windows and data windows can be aligned with deterministic slack and bounded stall behavior.
Abstract
The compiler models storage jitter, network jitter, congestion, HBM occupancy, and synchronization constraints, then emits deadline-aware DMA descriptors and aligned compute and I/O windows. Runtime enforcement state machines can enter bounded pause and checkpoint states when misses exceed allowed slack, reducing tail-latency amplification across heterogeneous clusters.
2026410243382026-03-01Alignment Training Safety
Regression-Firewalled Alignment Training: Micro-Canaries, Safe-Merge, Reward-Hacking Controls, and Drift Barriers
Summary
A regression firewall for alignment post-training evaluates candidate updates against capability canaries and can accept, constrain, partially merge, or reject updates based on regression risk.
Abstract
Micro-canaries are dynamically refreshed under token and runtime budgets, hard and soft capability thresholds are enforced, and Safe-Merge logic partitions update deltas into mergeable components such as layer groups or low-rank bases. Reward-hacking detection and drift-triggered certification provide additional control signals.
2026410222122026-02-25Distributed Control Plane
Deterministic Cross-Layer Invariant-Preserving Control Plane for Distributed Accelerator Clusters
Summary
This control plane models collective execution as a bounded temporal synchronization problem and allows infrastructure mutation only at synchronization-safe boundaries.
Abstract
Telemetry correlation and invariant evaluation engines project violation horizons across layers, and a deterministic orchestration controller authorizes routing updates, memory remapping, DMA pacing, or scheduler changes only when corrective latency remains safely bounded. The result is lower risk of stall amplification cascades.
2026410197822026-02-20GPU Preemption
System and Method for Transparent Suspend Resume Preemption of GPU Compute Workloads via Context Quiescence
Summary
This filing enables scheduler-driven suspend and resume of GPU workloads without requiring application-level checkpoint logic.
Abstract
The system identifies safe points, snapshots device memory and reconstruction metadata, releases GPU resources for higher-priority work, and later restores the workload deterministically. Scheduler-visible telemetry such as snapshot size and restore latency supports policy decisions in responsive GPU clusters.
2026410173502026-02-17HBM Residency Management
System and Method for Confidence-Gated HBM Residency Management with Thrash-Budgeted Prefetch, Pinning, and Eviction Control
Summary
A confidence-gated HBM controller uses telemetry, uncertainty bounds, and thrash budgets to decide when speculative residency actions are safe.
Abstract
The system forecasts near-term object reuse from page faults, DMA activity, stall time, allocator state, fragmentation indicators, and memory pressure. HBM is partitioned into pinned, speculative, staging, and reserve regions, and prefetch, pin, eviction, and compaction are permitted only while fault and churn budgets remain within policy.
2026410166782026-02-15AI Data Center Orchestration
System and Method for Workload-Phase-Aware Data Hydration and Thermal-Power Orchestration in High-Density AI Infrastructure
Summary
This invention coordinates predictive data hydration with GPU workload phases, thermal limits, and facility power headroom inside AI infrastructure.
Abstract
An interceptor observes I/O metadata and optional GPU telemetry, a classifier predicts workload phases, and a hydration planner stages future objects into fast tiers, optionally with edge decompression. A probabilistic safety gate and power-thermal scheduler suppress speculative actions when cache, network, SLA, or facility constraints are at risk.
2026410155262026-02-12Deterministic Inference
Deterministic Memory-Orchestrated Neural Network Inference Using Scheduled DMA Transfers and Bounded Buffers
Summary
This filing turns DRAM into a deterministic streaming substrate by staging model data into bounded on-chip buffers ahead of compute.
Abstract
A compiler and runtime tile model parameters, program DMA descriptors with dependency fences, and overlap transfer with execution through double- or multi-buffering. The architecture removes off-chip memory from the steady-state compute critical path and produces predictable latency and power behavior.
2026410073932026-01-25Vehicle Platform
A Low-Profile, Drive-On Vehicle Repositioning Platform with Deployable Omni-Directional Drive Module
Summary
A low-profile platform enables a parked vehicle to be translated laterally and rotated without modifying the vehicle.
Abstract
A load-bearing deck, limited-stroke lift, and deployable omni-directional drive bogies are coordinated by a controller that enforces load confirmation, tilt limits, slip-aware control, braking, and lock engagement. Optional embodiments add sealed enclosures, water-detection interlocks, solar-ready docking, auto-home alignment, and multi-unit coordination.
2026410055012026-01-20Edge AI Scheduling
SLA-Constrained Energy-Aware Inference Scheduling Using Memory Residency, DMA Transfer Policy, Variant Selection, and Performance-State Control on ARM Edge Systems
Summary
This invention targets edge AI systems that must minimize energy per inference while still satisfying latency constraints.
Abstract
A controller running on an ARM-based system predicts latency and energy for candidate execution policies and selects the best one under current request and hardware conditions. Policy dimensions include SRAM residency, no-waste DMA prefetch, precompiled graph-variant selection, and performance-state choices such as DVFS, with runtime telemetry continuously improving the predictive models.