Summary
Overview
Neural-network weights are treated as live runtime state whose placement and precision can change across HBM, lower volatile tiers, and storage-backed tiers according to workload behavior.
Abstract
Technical Abstract
A policy engine evaluates reuse, routing likelihood, layer criticality, transfer cost, decompression cost, bandwidth pressure, and quality sensitivity to decide how each weight shard or expert block should be stored and staged. The controller schedules promotions, demotions, decompression, and predictive prefetch while enforcing precision floors for quality-sensitive blocks.
Search Context
SEO Keywords
weight residency patent, neural network weight orchestration patent, HBM patent, memory hierarchy patent, inference precision patent
Related Patents
More Patents in memory residency, KV systems, and deterministic inference
These filings sit nearby in the portfolio and strengthen internal linking across related patent topics.