2026410433592026-04-04Inference Accelerator Memory
Summary
This filing introduces an architecture-agnostic hardware primitive that allows a privileged runtime to bind a bounded region of on-chip volatile memory to an off-chip backing address for a declared lifetime, with replacement logic structurally prevented from evicting the bound data during that interval.
Abstract
The invention extends the on-chip memory hierarchy with wired residency metadata, including a wired bit, tenant identifier, and generation counter per entry; replacement-controller logic that unconditionally excludes wired lines from victim selection; privileged BIND and RELEASE semantics; quota enforcement; immutable binding mode for read-only data; a soft-pin warning mechanism with configurable grace period; and a software residency arbiter that computes binding schedules from reuse distance and layer criticality across GPU-class accelerators, NPUs, TPUs, FPGAs, ASICs, and future inference architectures.
2026410438582026 filingAI Cluster Reliability
System and Method for Cross-Layer Gray Failure Detection, Propagation-Risk Estimation, and Throughput-Preserving Orchestration in AI Compute Clusters
Summary
A seam-aware reliability orchestrator detects and contains gray failures in AI compute clusters by combining cross-layer telemetry with workload-aware control actions that preserve useful throughput before a hard outage occurs.
Abstract
The controller normalizes facilities, compute, fabric, storage, and runtime telemetry into a common time-series base, computes a Gray Failure Score (GFS), a Propagation Risk Score (PRS), and a Failure Amplification Estimate (FAE), and then chooses containment actions based on job criticality and useful-throughput impact. It can reclassify nodes, racks, paths, or storage backends into restricted admissibility states, switch checkpoint mode from full to incremental or in-memory, isolate degraded domains, reroute collective or inference traffic, and prioritize critical workloads so local partial degradation is converted into bounded local slowdown instead of cluster-wide interruption.
2026410453472026 filingMemory-Centric AI Fabric
System and Method for Hardware-Resident Memory-Centric Orchestration of Multi-Tier Data Movement for Artificial Intelligence Compute Fabrics
Summary
A hardware-resident memory-centric fabric controller orchestrates AI data movement across SRAM, HBM, host DRAM, NVMe, and fabric-attached tiers using explicit software-defined memory intent instead of implicit cache inference.
Abstract
The invention maintains a Multi-Tier Residency Map in dedicated on-controller SRAM, uses saturating hardware regret counters for line-rate admission and eviction decisions, evaluates transfer-path cost versus arithmetic recomputation cost, and issues hardware transfer descriptors to distributed execution agents positioned near fabric and accelerator boundaries. In some embodiments the commands are cryptographically signed, and transfer completion is signaled through atomic PCIe or CXL doorbells so compute dispatch can resume entirely outside the host CPU operating-system hot path.
2026410459982026 filingCompiler Memory Intent IR
System and Method for Compiler-Emitted Memory Intent Intermediate Representation for Multi-Tier Artificial Intelligence Memory Orchestration
Summary
This filing introduces a first-class Memory Intent IR emitted by AI compiler systems so runtimes and hardware controllers can reason about object value, lifetime, phase, and movement policy using structured semantics rather than reactive generic caching heuristics.
Abstract
The invention relates to generating, encoding, transmitting, and consuming a compiler-emitted memory-behavior artifact for AI objects spanning on-chip SRAM, HBM, host DRAM, CXL-attached memory, pooled memory, peer-device memory, NVMe storage, and fabric-attached memory. It addresses deficiencies of conventional approaches including absence of a transferable memory-semantics artifact, reactive rather than anticipatory orchestration, object-class blindness, phase blindness, lack of coalition awareness, and excessive data movement and spill churn. The Memory Intent IR exposes object semantics such as sequence affinity, prefix-sharing value, stable reuse windows, recompute-cheap activations, routing-probability-driven expert hotness, and phase-specific lifetimes so downstream systems can orchestrate placement, migration, retention, replication, prefetch, spill, recomputation, and eviction with workload-aware precision.
2026410423372026-04-02Enterprise Multi-Agent AI
Systems and Methods for Semantic Deduplication and Shared Execution of Agent-Generated Enterprise Tasks
Summary
An enterprise AI middleware layer collapses semantically equivalent agent tasks into one shared execution so organizations can cut duplicate compute, API calls, and backend load without modifying individual agent code.
Abstract
Canonical task fingerprints are produced through SQL normalization and transformer-based semantic encoding. A non-resetting admission window gathers clustered sister tasks before execution, while exact hash matching and approximate vector search identify duplicates and near-duplicates. Matching requests are subscribed to a single Shared Execution Unit, and the final result is fanned out to all originating agents through registered callbacks, with distributed locking and deduplication metrics supporting multi-node deployments.
2026410410112026-03-31Multimodal Context Orchestration
Systems and Methods for Deterministic Staged Context Orchestration for Large Scale Multimodal AI Reasoning Systems
Summary
A deterministic context control plane manages active evidence in large reasoning systems through typed packets, named budget regions, semantic drift detection, and machine-readable audit records.
Abstract
The system combines Structured Evidence Packets, a Budget Allocation Controller with hard token floors, a Dynamic Re-Staging Engine that compares recent output embeddings with the active evidence set, and an Audit Ledger that records every admission and eviction operation. Together these mechanisms reduce repeated context reconstruction and context thrashing across successive reasoning steps.
2026410390652026-03-29Long-Context LLM Inference
Predictive Context Region Residency and Attention Orchestration
Summary
This invention makes long-context inference adaptive by assigning both attention-computation state and memory-residency state per context region instead of treating all tokens as equally important.
Abstract
A runtime policy engine uses cross-attention density, semantic relevance, recency, positional criticality, structural landmarks, and retrieval likelihood to promote, demote, compress, summarize, or prefetch context regions. A coherence-veto guardrail prevents eviction of regions tied to recent outputs, allowing retrieval-augmented and code-repository inference systems to reduce attended context while preserving quality targets.
2026410388572026-03-28Tiered Weight Memory
System and Method for Predictive Multi-Tier Weight Residency and Precision Orchestration for Neural-Network Inference
Summary
Neural-network weights are treated as live runtime state whose placement and precision can change across HBM, lower volatile tiers, and storage-backed tiers according to workload behavior.
Abstract
A policy engine evaluates reuse, routing likelihood, layer criticality, transfer cost, decompression cost, bandwidth pressure, and quality sensitivity to decide how each weight shard or expert block should be stored and staged. The controller schedules promotions, demotions, decompression, and predictive prefetch while enforcing precision floors for quality-sensitive blocks.
2026410375092026-03-27KV Cache Systems
Systems and Methods for Deterministic Gather of Hierarchically Managed Key-Value State for Neural Network Inference
Summary
The core invention is deterministic gather of logically managed key-value state into execution-ready artifacts, supported by a residency-first hierarchical memory substrate.
Abstract
Key-value objects are given logical identities independent of physical placement and tracked across SRAM, HBM, host memory, remote memory, and storage-backed tiers. A deterministic gather engine resolves shared lineage, compiles a gather plan, overlaps retrieval across tiers, and emits execution-ready tiles or descriptors without exposing compute kernels to metadata complexity.
2026410346202026-03-23Computational Structural Biology
Method and System for Class-I Peptide-MHC Structural Normalization and Wild-Type-Relative Mutation Tracking
Summary
A deterministic structural normalization pipeline converts peptide-MHC class I structure files into stable wild-type-relative mutation fingerprints for downstream computational analysis.
Abstract
Wild-type and mutant structures are parsed into residue-level chains, assigned fixed semantic roles, and transformed into peptide-to-heavy-chain contact records under explicit distance criteria. The system computes a normalized delta relative to wild type and serializes a machine-readable fingerprint object containing mutation identity, chain mapping status, and contact-derived fields.
2026410319262026-03-17Accelerator Architecture
SRMIC-X1: SRAM-Residency Memory-Centric Inference Chip - A Residency-First LLM Decode Accelerator Architecture
Summary
SRMIC-X1 is a residency-first inference accelerator that treats the active per-token working set as a hardware primitive and prioritizes hot SRAM residency over HBM-bound decode behavior.
Abstract
The architecture combines a distributed Hot Residency Memory SRAM tier, a high-bandwidth SRMESH-X interconnect, an HBM cold tier, and an optional CXL warm tier. Hardware residency controllers execute promote, demote, pin, spill, and multicast operations using per-page metadata, while bounded per-region working-set rules keep decode-critical service time stable.
2026410297642026-03-12Kernel Autotuning
Systems and Methods for Noise-Aware Benchmarking and Optimization of AI-Generated Compute Kernels
Summary
This filing defines a reliability-gated benchmarking loop for generated accelerator kernels so noisy measurements do not drive bad kernel-selection decisions.
Abstract
Benchmark trials are classified by telemetry-derived contamination state, unresolved finalists receive additional benchmark budget, and a kernel is promoted only if it remains superior under acceptable measurement conditions. The inventive chain is contamination classification, targeted reruns, and gated promotion.
2026410263562026-03-06AI Audit and Enforcement
Systems and Methods for Enforced Immutable Reasoning Event Logging Using Isolated Memory Tiers
Summary
This invention makes reasoning-event logging mandatory for AI agent actions by tying tool execution to an immutable, hardware-isolated audit pathway.
Abstract
An Action Intent Capsule must be written to a hardware-isolated append-only tier and validated through a Log Commit Token before any tool gateway can authorize execution. A strict propose, log-commit, execute, and finalize pipeline, combined with tamper-evident data structures, creates an intent-to-effect audit trail.
2026410260342026-03-05Compiler and I/O Scheduling
Unified Storage-Aware Accelerator Compilation with Deterministic I/O Windows and Bounded Stall Guarantees
Summary
Storage and network movement are elevated into accelerator compilation so compute windows and data windows can be aligned with deterministic slack and bounded stall behavior.
Abstract
The compiler models storage jitter, network jitter, congestion, HBM occupancy, and synchronization constraints, then emits deadline-aware DMA descriptors and aligned compute and I/O windows. Runtime enforcement state machines can enter bounded pause and checkpoint states when misses exceed allowed slack, reducing tail-latency amplification across heterogeneous clusters.
2026410243382026-03-01Alignment Training Safety
Regression-Firewalled Alignment Training: Micro-Canaries, Safe-Merge, Reward-Hacking Controls, and Drift Barriers
Summary
A regression firewall for alignment post-training evaluates candidate updates against capability canaries and can accept, constrain, partially merge, or reject updates based on regression risk.
Abstract
Micro-canaries are dynamically refreshed under token and runtime budgets, hard and soft capability thresholds are enforced, and Safe-Merge logic partitions update deltas into mergeable components such as layer groups or low-rank bases. Reward-hacking detection and drift-triggered certification provide additional control signals.
2026410222122026-02-25Distributed Control Plane
Deterministic Cross-Layer Invariant-Preserving Control Plane for Distributed Accelerator Clusters
Summary
This control plane models collective execution as a bounded temporal synchronization problem and allows infrastructure mutation only at synchronization-safe boundaries.
Abstract
Telemetry correlation and invariant evaluation engines project violation horizons across layers, and a deterministic orchestration controller authorizes routing updates, memory remapping, DMA pacing, or scheduler changes only when corrective latency remains safely bounded. The result is lower risk of stall amplification cascades.
2026410197822026-02-20GPU Preemption
System and Method for Transparent Suspend Resume Preemption of GPU Compute Workloads via Context Quiescence
Summary
This filing enables scheduler-driven suspend and resume of GPU workloads without requiring application-level checkpoint logic.
Abstract
The system identifies safe points, snapshots device memory and reconstruction metadata, releases GPU resources for higher-priority work, and later restores the workload deterministically. Scheduler-visible telemetry such as snapshot size and restore latency supports policy decisions in responsive GPU clusters.
2026410173502026-02-17HBM Residency Management
System and Method for Confidence-Gated HBM Residency Management with Thrash-Budgeted Prefetch, Pinning, and Eviction Control
Summary
A confidence-gated HBM controller uses telemetry, uncertainty bounds, and thrash budgets to decide when speculative residency actions are safe.
Abstract
The system forecasts near-term object reuse from page faults, DMA activity, stall time, allocator state, fragmentation indicators, and memory pressure. HBM is partitioned into pinned, speculative, staging, and reserve regions, and prefetch, pin, eviction, and compaction are permitted only while fault and churn budgets remain within policy.
2026410166782026-02-15AI Data Center Orchestration
System and Method for Workload-Phase-Aware Data Hydration and Thermal-Power Orchestration in High-Density AI Infrastructure
Summary
This invention coordinates predictive data hydration with GPU workload phases, thermal limits, and facility power headroom inside AI infrastructure.
Abstract
An interceptor observes I/O metadata and optional GPU telemetry, a classifier predicts workload phases, and a hydration planner stages future objects into fast tiers, optionally with edge decompression. A probabilistic safety gate and power-thermal scheduler suppress speculative actions when cache, network, SLA, or facility constraints are at risk.
2026410155262026-02-12Deterministic Inference
Deterministic Memory-Orchestrated Neural Network Inference Using Scheduled DMA Transfers and Bounded Buffers
Summary
This filing turns DRAM into a deterministic streaming substrate by staging model data into bounded on-chip buffers ahead of compute.
Abstract
A compiler and runtime tile model parameters, program DMA descriptors with dependency fences, and overlap transfer with execution through double- or multi-buffering. The architecture removes off-chip memory from the steady-state compute critical path and produces predictable latency and power behavior.
2026410073932026-01-25Vehicle Platform
A Low-Profile, Drive-On Vehicle Repositioning Platform with Deployable Omni-Directional Drive Module
Summary
A low-profile platform enables a parked vehicle to be translated laterally and rotated without modifying the vehicle.
Abstract
A load-bearing deck, limited-stroke lift, and deployable omni-directional drive bogies are coordinated by a controller that enforces load confirmation, tilt limits, slip-aware control, braking, and lock engagement. Optional embodiments add sealed enclosures, water-detection interlocks, solar-ready docking, auto-home alignment, and multi-unit coordination.
2026410055012026-01-20Edge AI Scheduling
SLA-Constrained Energy-Aware Inference Scheduling Using Memory Residency, DMA Transfer Policy, Variant Selection, and Performance-State Control on ARM Edge Systems
Summary
This invention targets edge AI systems that must minimize energy per inference while still satisfying latency constraints.
Abstract
A controller running on an ARM-based system predicts latency and energy for candidate execution policies and selects the best one under current request and hardware conditions. Policy dimensions include SRAM residency, no-waste DMA prefetch, precompiled graph-variant selection, and performance-state choices such as DVFS, with runtime telemetry continuously improving the predictive models.