Summary
Overview
This invention makes long-context inference adaptive by assigning both attention-computation state and memory-residency state per context region instead of treating all tokens as equally important.
Abstract
Technical Abstract
A runtime policy engine uses cross-attention density, semantic relevance, recency, positional criticality, structural landmarks, and retrieval likelihood to promote, demote, compress, summarize, or prefetch context regions. A coherence-veto guardrail prevents eviction of regions tied to recent outputs, allowing retrieval-augmented and code-repository inference systems to reduce attended context while preserving quality targets.
Search Context
SEO Keywords
long context inference patent, context residency patent, attention orchestration patent, LLM memory patent, retrieval augmented generation patent
Related Patents
More Patents in long-context inference and evidence orchestration
These filings sit nearby in the portfolio and strengthen internal linking across related patent topics.