Summary
Overview
A regression firewall for alignment post-training evaluates candidate updates against capability canaries and can accept, constrain, partially merge, or reject updates based on regression risk.
Abstract
Technical Abstract
Micro-canaries are dynamically refreshed under token and runtime budgets, hard and soft capability thresholds are enforced, and Safe-Merge logic partitions update deltas into mergeable components such as layer groups or low-rank bases. Reward-hacking detection and drift-triggered certification provide additional control signals.
Search Context
SEO Keywords
alignment training patent, micro canary patent, safe merge patent, reward hacking control patent, model regression patent
Related Patents
More Patents in AI agents, alignment, and enterprise orchestration
These filings sit nearby in the portfolio and strengthen internal linking across related patent topics.